ICON: Invariant Counterfactual Optimization with Neuro-Symbolic Priors for Text-Based Person Search
Xiangyu Wang, Zhixin Lv, Yongjiao Sun, Anrui Han, Ye Yuan, Hangxu Ji

TL;DR
This paper introduces ICON, a neuro-symbolic framework for Text-Based Person Search that enhances robustness and invariance to spatial, background, and noise variations by integrating causal and topological priors.
Contribution
The paper proposes a novel neuro-symbolic approach with multiple interventions and regularizations to improve robustness and causal invariance in TBPS models.
Findings
ICON achieves state-of-the-art performance on benchmarks.
Enhanced robustness against occlusion and background interference.
Effective in maintaining geometric invariance and environmental independence.
Abstract
Text-Based Person Search (TBPS) holds unique value in real-world surveillance bridging visual perception and language understanding, yet current paradigms utilizing pre-training models often fail to transfer effectively to complex open-world scenarios. The reliance on "Passive Observation" leads to multifaceted spurious correlations and spatial semantic misalignment, causing a lack of robustness against distribution shifts. To fundamentally resolve these defects, this paper proposes ICON (Invariant Counterfactual Optimization with Neuro-symbolic priors), a framework integrating causal and topological priors. First, we introduce Rule-Guided Spatial Intervention to strictly penalize sensitivity to bounding box noise, forcibly severing location shortcuts to achieve geometric invariance. Second, Counterfactual Context Disentanglement is implemented via semantic-driven background…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Multimodal Machine Learning Applications
