Conference papers

Show, Prefer and Tell: Incorporating User Preferences into Image Captioning

Annika Lindh, Technological University DublinFollow
Robert J. Ross, Technological University DublinFollow
John Kelleher, Technological University DublinFollow

Document Type

Conference Paper

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

https://dl.acm.org/doi/pdf/10.1145/3555776.3577794

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingMarch 2023Pages 1139–1142

https://doi.org/10.1145/3555776.3577794

Abstract

Image Captioning (IC) is the task of generating natural language descriptions for images. Models encode the image using a convolutional neural network (CNN) and generate the caption via a recurrent model or a multi-modal transformer. Success is measured by the similarity between generated captions and human-written “ground-truth” captions, using the CIDEr [14], SPICE [1] and METEOR [2] metrics. While incremental gains have been made on these metrics, there is a lack of focus on end-user opinions on the amount of content in captions. Studies with blind and low-vision participants have found that lack of detail is a problem [6, 13, 17], and that the preferred amount of content varies between individuals [13], as do individual opinions on the trade-off between correctness and adding additional content with lower confidence [9]. We propose a more user-centered approach with an adjustable amount of content based on the number of regions to describe.

DOI

https://doi.org/10.1145/3555776.3577794

Recommended Citation

Lindh, Annika; Ross, Robert J.; and Kelleher, John, "Show, Prefer and Tell: Incorporating User Preferences into Image Captioning" (2023). Conference papers. 409.
https://arrow.tudublin.ie/scschcomcon/409

Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.

Download

Included in

Computer Engineering Commons

COinS

Conference papers

Show, Prefer and Tell: Incorporating User Preferences into Image Captioning

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Show, Prefer and Tell: Incorporating User Preferences into Image Captioning

Authors

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links