Hybrid, Unified and Iterative: A Novel Framework for Text-based Person Anomaly Retrieval
Tien-Huy Nguyen, Huu-Loc Tran, Huu-Phong Phan-Nguyen, Quang-Vinh Dinh

TL;DR
This paper introduces a novel framework combining a Local-Global Hybrid Perspective module, a Unified Image-Text model, and an iterative ensemble strategy to improve text-based person anomaly retrieval, achieving state-of-the-art results.
Contribution
It proposes a new hybrid, unified, and iterative framework integrating fine-grained features, multiple loss functions, and a novel ensemble strategy for enhanced retrieval performance.
Findings
Achieves 9.70% improvement in R@1 on PAB dataset
Demonstrates effectiveness of the LHP module in feature extraction
Outperforms previous methods with state-of-the-art results
Abstract
Text-based person anomaly retrieval has emerged as a challenging task, with most existing approaches relying on complex deep-learning techniques. This raises a research question: How can the model be optimized to achieve greater fine-grained features? To address this, we propose a Local-Global Hybrid Perspective (LHP) module integrated with a Vision-Language Model (VLM), designed to explore the effectiveness of incorporating both fine-grained features alongside coarse-grained features. Additionally, we investigate a Unified Image-Text (UIT) model that combines multiple objective loss functions, including Image-Text Contrastive (ITC), Image-Text Matching (ITM), Masked Language Modeling (MLM), and Masked Image Modeling (MIM) loss. Beyond this, we propose a novel iterative ensemble strategy, by combining iteratively instead of using model results simultaneously like other ensemble methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Face recognition and analysis · Time Series Analysis and Forecasting
