Revisiting image captioning structures based on CNN and RNN, and improving the performance using modified decoders with residual connections
dc.contributor | Graduate Program in Electrical and Electronic Engineering. | |
dc.contributor.advisor | Anarım, Emin. | |
dc.contributor.author | Saraçoğlu, Sinan. | |
dc.date.accessioned | 2025-04-14T12:25:50Z | |
dc.date.available | 2025-04-14T12:25:50Z | |
dc.date.issued | 2023 | |
dc.description.abstract | In this thesis, the image captioning structure consisting of a Convolutional Neural Network (CNN) as the encoder and a Recurrent Neural Network (RNN) as the decoder is visited by comparing and evaluating the effects of different image feature extractors, different RNN cells, different types of word embeddings, and the involvement of residual connections between the RNN cells. The famous “Show, Attend and Tell” model is modified by adding residual connections between the RNN cells and adding other modifications on both the encoder and the decoder side, which improved the performance of the model on the image captioning task. Furthermore, models were trained by implementing 3 different pre-trained word embeddings and their benefits were explored. With the best model, 34 BLEU-4 points and 15 SPICE points improvement were achieved compared with the base model. The effects of training our best model with the images transformed into the frequency domain rather than the images represented in the spatial domain are investigated and it is concluded that this approach cannot enhance the performance of the model. The results of the experiments demonstrate the effectiveness of the proposed modifications and provide insights into the potential of residual connections. | |
dc.format.pages | xiii, 74 leaves | |
dc.identifier.other | Graduate Program in Electrical and Electronic Engineering. TKL 2023 U68 PhD (Thes SOC 2023 T74 | |
dc.identifier.uri | https://digitalarchive.library.bogazici.edu.tr/handle/123456789/21539 | |
dc.publisher | Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023. | |
dc.subject.lcsh | Convolutions (Mathematics) | |
dc.subject.lcsh | Recurrent sequences (Mathematics) | |
dc.title | Revisiting image captioning structures based on CNN and RNN, and improving the performance using modified decoders with residual connections |
Files
Original bundle
1 - 1 of 1