CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search

Zequn Xie

arXiv:2601.18625·cs.CV·January 27, 2026

CONQUER: Context-Aware Representation with Query Enhancement for Text-Based Person Search

Zequn Xie

PDF

Open Access

TL;DR

CONQUER is a novel two-stage framework that enhances text-based person search by improving cross-modal alignment during training and refining user queries at inference, leading to better retrieval accuracy especially with ambiguous or incomplete queries.

Contribution

The paper introduces CONQUER, a new framework that combines multi-granularity encoding, optimal transport-based matching, and query refinement, advancing the state-of-the-art in text-based person search.

Findings

01

Outperforms existing methods on CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets.

02

Improves Rank-1 accuracy and mAP in cross-domain and incomplete-query scenarios.

03

Provides a practical, plug-and-play query enhancement module without retraining the backbone.

Abstract

Text-Based Person Search (TBPS) aims to retrieve pedestrian images from large galleries using natural language descriptions. This task, essential for public safety applications, is hindered by cross-modal discrepancies and ambiguous user queries. We introduce CONQUER, a two-stage framework designed to address these challenges by enhancing cross-modal alignment during training and adaptively refining queries at inference. During training, CONQUER employs multi-granularity encoding, complementary pair mining, and context-guided optimal matching based on Optimal Transport to learn robust embeddings. At inference, a plug-and-play query enhancement module refines vague or incomplete queries via anchor selection and attribute-driven enrichment, without requiring retraining of the backbone. Extensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid demonstrate that CONQUER consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Multimodal Machine Learning Applications · Advanced Neural Network Applications