Dissertations

Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning

Deepti Balasubramaniam, Technological University Dublin

Document Type

Dissertation

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Science), 2021.

Abstract

Over the last few decades computer vision and Natural Language processing has shown tremendous improvement in different tasks such as image captioning, video captioning, machine translation etc using deep learning models. However, there were not much researches related to image captioning based on transformers and how it outperforms other models that were implemented for image captioning. In this study will be designing a simple encoder-decoder model, attention model and transformer model for image captioning using Flickr8K dataset where will be discussing about the hyperparameters of the model, type of pre-trained model used and how long the model has been trained. Furthermore, will be comparing the captions generated by attention model and transformer model using BLEU score metrics, which will be further analysed using human evaluation conducted using intrinsic approach. After analysis of results obtained using statistical test conducted on BLEU score metrics and human evaluation it was found that transformer model with multi-head attention has outperformed attention model in image captioning.

DOI

https://doi.org/10.21427/E0GA-P612

Recommended Citation

Balasubramaniam, D. (2021). Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning. Technological University Dublin. DOI: 10.21427/E0GA-P612

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Dissertations

Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Evaluating the Performance of Transformer architecture over Attention architecture on Image Captioning

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links