EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

A topic-based multi-channel attention model under hybrid mode for image caption.

Authors

Qian, Kui; Tian, Lei

Abstract

Automatically generating captions of an image is not closely related to every spatial area of the visual information, but always related to the topic of the image expression. Aiming at the decoupling problem of visual spatial feature attention and semantic decoder, a topic-based multi-channel attention model (TMA) under hybrid mode for image caption is proposed. First, natural language processing (NLP) technology is used to preprocess the caption references, including filtering stop words, analyzing word frequency and constructing a semantic network graph with node labels. Then, combined with the image features extracted by the convolutional neural network (CNN), a semantic perception network is designed to achieve cross-domain prediction from image to topic. Next, a topic-based multi-channel attention fusion mechanism is proposed to realize image-text attention fusion representation under the joint action of the global spatial features of the image, the local semantic features of the graph nodes and the hidden layer features of the long short-term memory (LSTM) decoder. Finally, multi-task loss function is used to train the TMA. Experimental results show that the proposed model has better evaluation performance with topic-focused attention than state-of-the-art (SOTA) methods.

Subjects

NATURAL language processing; CONVOLUTIONAL neural networks; WORD frequency; GRAPH labelings

Publication

Neural Computing & Applications, 2022, Vol 34, Issue 3, p2207

ISSN

0941-0643

Publication type

Academic Journal

DOI

10.1007/s00521-021-06557-8

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved