CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval
Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang,, Yifeng Li

TL;DR
This paper introduces CAIBC, a multi-branch architecture that captures comprehensive visual information beyond color for text-based person retrieval, addressing over-reliance on color and improving retrieval accuracy.
Contribution
The paper proposes a novel multi-branch framework with mutual learning to incorporate all-round visual cues, surpassing existing methods in text-based person retrieval.
Findings
Outperforms existing methods on CUHK-PEDES and RSTPReid datasets.
Achieves state-of-the-art results in supervised and weakly supervised settings.
Effectively balances color and non-color information for better retrieval.
Abstract
Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural information, etc.), and thereby lead to a sub-optimal retrieval performance. To solve this problem, in this paper, we propose to \textbf{C}apture \textbf{A}ll-round \textbf{I}nformation \textbf{B}eyond \textbf{C}olor (\textbf{CAIBC}) via a jointly optimized multi-branch architecture for text-based person retrieval. CAIBC contains three branches including an RGB branch, a grayscale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
