Attentive Contextual Network for Image Captioning

Prudviraj, J. and Mohan, C. K. and et al, . (2021) Attentive Contextual Network for Image Captioning. In: 2021 International Joint Conference on Neural Networks (IJCNN), 18 July 2021 through 22 July 2021, Virtual, Shenzhen.

Full text not available from this repository. (Request a copy)

Abstract

Existing image captioning approaches fail to generate fine-grained captions due to the lack of rich encoding representation of an image. In this paper, we present an attentive contextual network (ACN) to learn the spatially transformed image features and dense multi-scale contextual information of an image to generate semantically meaningful captions. At first, we construct deformable network on intermediate layers of convolutional neural network (CNN) to cultivate spatial invariant features. And the multi-scale contextual features are produced by employing contextual network on top of last layers of CNN. Then, we exploit attention mechanism on contextual network to extract dense contextual features. Further, the extracted spatial and contextual features are combined to encode the holistic representation of an image. Finally, a multi-stage caption decoder with visual attention module is incorporated to generate fine-grained captions. The performance of the proposed approach is demonstrated on COCO dataset, the largest dataset for image captioning.

[error in script]

IITH Creators: