We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Word-to-region attention network for visual question answering.
- Authors
Peng, Liang; Yang, Yang; Bin, Yi; Xie, Ning; Shen, Fumin; Ji, Yanli; Xu, Xing
- Abstract
Visual attention, which allows more concentration on the image regions that are relevant to a reference question, brings remarkable performance improvement in Visual Question Answering (VQA). Most VQA attention models employ the entire reference question representation to query relevant image regions. Nonetheless, only certain salient words of the question play an effective role in an attention operation. In this paper, we propose a novel Word-to-Region Attention Network (WRAN), which can 1) simultaneously locate pertinent object regions instead of a uniform grid of image regions of euqal size and identify the corresponding words of the reference question; as well as 2) enforce consistency between image object regions and core semantics in questions. We evaluate the proposed model on the VQA v1.0 and VQA v2.0 datasets. Experimental results demonstrate the superiority of the proposed model as compared to the state-of-the-arts.
- Subjects
QUESTION answering systems; ARTIFICIAL intelligence; MACHINE learning; VISUAL perception; SEMANTICS
- Publication
Multimedia Tools & Applications, 2019, Vol 78, Issue 3, p3843
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-018-6389-3