Your institution may have access to this item. Find your institution then sign in to continue.

Title: 基于语义致性约束与局部-全局感知的多模态3D视觉定位.
Authors: 罗寒; 马浩统; 刘杰; 严华; 雷印杰
Abstract: The scarcity of 3D multimodal data results in a lack of semantic consistency between text and visual features during supervised training using traditional methods. Meanwhile, traditional methods also overlook local relationships and global information, resulting in poor performance. To address the above issues, this paper proposed a semantic consistency constrain and local-global aware multi-modal 3D visual grounding method. Firstly, the method helped the 3D model extract point cloud-text semantic consistency features by distilling 2D pre-trained visual language model knowledge. Secondly, it designed a localglobal aware module to continuously supplement and enhanced candidate target features to match targets more accurately. Experiments conducted on the ScanRefer dataset show that the proposed method achieves 50.53% and 37.67% in terms of Acc@ 0.25 IoU and Acc@0.5 IoU and exceeds most existing 3D visual grounding methods, confirming the effectiveness of the method.
Publication: Application Research of Computers / Jisuanji Yingyong Yanjiu, 2024, Vol 41, Issue 7, p2203
ISSN: 1001-3695
Publication type: Article
DOI: 10.19734/j.issn.1001-3695.2023.09.0515

We found a match

基于语义致性约束与局部-全局感知的 多模态3D视觉定位.