Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search
Zequn Xie, Guijin Luo, Chuxin Wang, Sihang Cai, Tao Jin, Zhou Zhao, and Yixuan Tang

TL;DR
The paper introduces SSDC, a two-stage cascade framework that improves text-based person anomaly search by efficiently combining skeletal structure filtering with semantic verification, achieving state-of-the-art results.
Contribution
It proposes a novel decoupled cascade approach that balances efficiency and semantic reasoning in large-scale person anomaly retrieval.
Findings
SSDC outperforms existing methods on the PAB benchmark.
The framework effectively balances speed and semantic accuracy.
Semantic re-ranking improves retrieval precision.
Abstract
Text-based person anomaly search retrieves specific behavioral events from surveillance archives using natural-language queries. Although recent pose-aware methods align geometric structures well, they face a fundamental Pose-Semantic Gap: semantically different actions can share similar skeletal geometries. While Multimodal Large Language Models (MLLMs) can reduce this ambiguity, using them for large-scale retrieval is computationally prohibitive. We propose the Structure-Semantic Decoupled Cascade (SSDC) framework, which decouples retrieval into two stages: (1) Structure-Aware Coarse Retrieval, where a lightweight model quickly filters candidates by skeletal similarity ; and (2) Detective Squad Interaction, a multi-agent semantic verification module. The squad consists of a Detective for fast binary filtering, an Analyst for evidence extraction, and a Writer for semantic synthesis.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
