We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Task-like training paradigm in CLIP for zero-shot sketch-based image retrieval.
- Authors
Zhang, Haoxiang; Cheng, Deqiang; Jiang, He; Liu, Jingjing; Kou, Qiqi
- Abstract
The Contrastive Language-Image Pre-training model (CLIP) has recently gained attention in the zero-shot domain. However it still falls short in addressing cross-modal perception, and the semantic gap between seen and unseen classes in Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR). To overcome these obstacles, we propose a Task-Like Training paradigm (TLT). In this work, we view the cross-modal perception and the semantic gap as a multi-task learning process. Before tackling the challenges, we fully utilize CLIP's text encoder and propose text-based identification learning mechanism to assist the model to learn discriminative features quickly. Next, we propose text prompt tutoring and the cross-modal consistency learning to solve cross-modal perception and the semantic gap, respectively. Meanwhile, we present a collaborative architecture to explore the potential shared information between tasks. Extensive results show that our approach significantly outperforms the state-of-the-art methods on Sketchy, Sketchy-No, Tuberlin, and QuickDraw datasets.
- Subjects
IMAGE retrieval; TUTORS &; tutoring
- Publication
Multimedia Tools & Applications, 2024, Vol 83, Issue 19, p57811
- ISSN
1380-7501
- Publication type
Article
- DOI
10.1007/s11042-023-17675-x