Conference papers

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Annika Lindh, Technological University DublinFollow
Robert J. Ross, Technological University DublinFollow
John D. Kelleher, Technological University DublinFollow

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences

Publication Details

This paper was presented at the 28th International Conference on Computational Linguistics (COLING 2020).

Abstract

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

DOI

https://doi.org/10.18653/v1/2020.coling-main.174

Recommended Citation

Lindh, A., Ross, R.J. & Kelleher, J. D. (2020.) Language-Driven Region Pointer Advancement for Controllable Image Captioning. In Proceedings of the 28th International Conference on Computational Linguistics, pages 1922–1935, Barcelona, Spain (Online), December. International Committee on Computational Linguistics.

Funder

ADAPT SFI Research Centre

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Conference papers

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links