We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Learning shared features from specific and ambiguous descriptions for text-based person search.
- Authors
Cheng, Ke; Geng, Qikai; Huang, Shucheng; Tu, Juanjuan; Lu, Hu
- Abstract
Text-based person search endeavors to utilize natural language descriptions for retrieving pedestrian images. Previous studies have primarily focused on leveraging information among pedestrians with distinct identities, overlooking the exploration of data variations within the same identity. Although some have attempted to extract multiple samples for each identity, an appropriate loss function was not employed. In response to this research gap, we present LFSA, a concise cross-model framework that Learns shared Features from Specific and Ambiguous descriptions. Firstly, building upon a distinctive sampling strategy, we formulate the Boundary Constraints Loss (BCL) and the Hard Sample Mining Loss (HSML) with the aim of extracting unique features from specific descriptions while simultaneously capturing shared features from ambiguous descriptions. Then, we introduce a textual augmentation module denoted as Mask-Delete-Replace (MDR). This module employs three operations to direct the model’s attention toward more comprehensive details within the textual descriptors. LFSA utilizes CLIP as the backbone of the network, only leveraging its global features from the [CLS] token. Extensive experiments on two benchmark datasets, CUHK-PEDES and ICFG-PEDES, demonstrate the effectiveness of our approach. Codes are available at .
- Publication
Multimedia Systems, 2024, Vol 30, Issue 2, p1
- ISSN
0942-4962
- Publication type
Article
- DOI
10.1007/s00530-024-01286-z