Document Type
Conference Paper
Rights
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Disciplines
Computer Sciences
Abstract
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty.
We make our source code publicly available online: https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captioning
DOI
https://doi.org/10.1007/978-3-030-01418-6_18
Recommended Citation
Lindh, A., Ross, R. J., Mahalunkar, A., Salton, G., & Kelleher, J. D. (2018). Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning. In Artificial Neural Networks and Machine Learning – ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I.: Springer International Publishing.Proceedings, Part I.: Springer International Publishing.
Funder
ADAPT Centre for Digital Content Technology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
Artificial Neural Networks and Machine Learning – ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I. Springer International Publishing, (2018). LNCS 11139. ISBN 978-3-030-01418-6.