TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval
Xiangzhao Hao, Shijie Wang, Tianyu Yang, Tianyue Wang, Haiyun Guo, and Jinqiao Wang

TL;DR
TRACE introduces a unified model that combines generative reasoning with discriminative embedding learning for improved multimodal retrieval, enabling better handling of complex queries and zero-shot transfer.
Contribution
It presents TRACE, a novel framework that unifies reasoning and representation learning, and introduces M-BEIR-CoT, a large-scale dataset for training and evaluating such models.
Findings
Achieves state-of-the-art results on M-BEIR benchmark.
Demonstrates implicit routing behavior for complex versus simple queries.
Exhibits strong zero-shot transferability to unseen domains.
Abstract
Universal Multimodal Retrieval requires unified embedding models capable of interpreting diverse user intents, ranging from simple keywords to complex compositional instructions. While Multimodal Large Language Models (MLLMs) possess strong reasoning capabilities, prevailing adaptations confine them to static encoders, underutilizing their generative potential. This encoder-only paradigm struggles with complex intents that demand logical deduction rather than superficial pattern matching. To address this, we introduce TRACE (Task-adaptive Reasoning And Compressing Embeddings). TRACE unifies generative reasoning with discriminative representation learning. It first generates a structured Chain-of-Thought (CoT) to explicitly reason about the query, and subsequently compresses this reasoning trace into a compact embedding via a dedicated token. To train this framework, we construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
