Revisiting image captioning structures based on CNN and RNN, and improving the performance using modified decoders with residual connections

dc.contributorGraduate Program in Electrical and Electronic Engineering.
dc.contributor.advisorAnarım, Emin.
dc.contributor.authorSaraçoğlu, Sinan.
dc.date.accessioned2025-04-14T12:25:50Z
dc.date.available2025-04-14T12:25:50Z
dc.date.issued2023
dc.description.abstractIn this thesis, the image captioning structure consisting of a Convolutional Neural Network (CNN) as the encoder and a Recurrent Neural Network (RNN) as the decoder is visited by comparing and evaluating the effects of different image feature extractors, different RNN cells, different types of word embeddings, and the involvement of residual connections between the RNN cells. The famous “Show, Attend and Tell” model is modified by adding residual connections between the RNN cells and adding other modifications on both the encoder and the decoder side, which improved the performance of the model on the image captioning task. Furthermore, models were trained by implementing 3 different pre-trained word embeddings and their benefits were explored. With the best model, 34 BLEU-4 points and 15 SPICE points improvement were achieved compared with the base model. The effects of training our best model with the images transformed into the frequency domain rather than the images represented in the spatial domain are investigated and it is concluded that this approach cannot enhance the performance of the model. The results of the experiments demonstrate the effectiveness of the proposed modifications and provide insights into the potential of residual connections.
dc.format.pagesxiii, 74 leaves
dc.identifier.otherGraduate Program in Electrical and Electronic Engineering. TKL 2023 U68 PhD (Thes SOC 2023 T74
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/21539
dc.publisherThesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023.
dc.subject.lcshConvolutions (Mathematics)
dc.subject.lcshRecurrent sequences (Mathematics)
dc.titleRevisiting image captioning structures based on CNN and RNN, and improving the performance using modified decoders with residual connections

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b2795548.038345.001.PDF
Size:
3.32 MB
Format:
Adobe Portable Document Format

Collections