Incorporating attentive multi-scale context information for image captioning

Prudviraj, Jeripothula and Sravani, Yenduri and Mohan, C. K. (2022) Incorporating attentive multi-scale context information for image captioning. Multimedia Tools and Applications. ISSN 1380-7501

Full text not available from this repository. (Request a copy)

Abstract

In this paper, we propose a novel encoding framework to learn the multi-scale context information of the visual scene for image captioning task. The devised multi-scale context information constitutes spatial, semantic, and instance level features of an input mage. We draw spatial features from early convolutional layers, and multi-scale semantic features are achieved by employing a feature pyramid network on top of deep convolutional neural networks. Then, we concatenate the spatial and multi-scale semantic features to harvest fine-to-coarse details of the visual scene. Further, the instance level features are captured by employing a bi-linear interpolation technique on fused representation to hold object-level semantics of an image. We exploit an attention mechanism on attained features to guide the caption decoding module. In addition, we explore various combinations of encoding techniques to acquire global and local features of an image. The efficacy of the proposed approaches is demonstrated on the COCO dataset. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Mohan, C. K.https://orcid.org/0000-0002-7316-0836
Item Type: Article
Uncontrolled Keywords: Image captioning; Image encoding mechanism; Multi-scale context information; Visual attention
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 26 Jul 2022 11:58
Last Modified: 26 Jul 2022 11:58
URI: http://raiithold.iith.ac.in/id/eprint/9941
Publisher URL: http://doi.org/10.1007/s11042-021-11895-9
OA policy: https://v2.sherpa.ac.uk/id/publication/17294
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 9941 Statistics for this ePrint Item