We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
基于语义致性约束与局部-全局感知的 多模态3D视觉定位.
- Authors
罗寒; 马浩统; 刘杰; 严华; 雷印杰
- Abstract
The scarcity of 3D multimodal data results in a lack of semantic consistency between text and visual features during supervised training using traditional methods. Meanwhile, traditional methods also overlook local relationships and global information, resulting in poor performance. To address the above issues, this paper proposed a semantic consistency constrain and local-global aware multi-modal 3D visual grounding method. Firstly, the method helped the 3D model extract point cloud-text semantic consistency features by distilling 2D pre-trained visual language model knowledge. Secondly, it designed a localglobal aware module to continuously supplement and enhanced candidate target features to match targets more accurately. Experiments conducted on the ScanRefer dataset show that the proposed method achieves 50.53% and 37.67% in terms of Acc@ 0.25 IoU and Acc@0.5 IoU and exceeds most existing 3D visual grounding methods, confirming the effectiveness of the method.
- Publication
Application Research of Computers / Jisuanji Yingyong Yanjiu, 2024, Vol 41, Issue 7, p2203
- ISSN
1001-3695
- Publication type
Article
- DOI
10.19734/j.issn.1001-3695.2023.09.0515