O1 Embedder: Let Retrievers Think Before Action

Ruiran Yan; Zheng Liu; Defu Lian

arXiv:2502.07555·cs.CL·February 13, 2025

O1 Embedder: Let Retrievers Think Before Action

Ruiran Yan, Zheng Liu, Defu Lian

PDF

Open Access

TL;DR

The paper introduces O1 Embedder, a novel retrieval model that generates intermediate thoughts before retrieval, significantly enhancing accuracy and generalizability across diverse datasets.

Contribution

It proposes a new training workflow and optimization method for retrieval models to generate useful thoughts, improving multi-task and zero-shot retrieval capabilities.

Findings

01

Achieved substantial improvements on 12 datasets.

02

Demonstrated strong generalization in out-of-domain scenarios.

03

Enhanced retrieval accuracy with the O1 Embedder approach.

Abstract

The growing power of large language models (LLMs) has revolutionized how people access and utilize information. Notably, the LLMs excel at performing fine-grained data representation, which facilitates precise retrieval of information. They also generate high-quality answers based on external references, enabling the production of useful knowledge. The recent introduction of reasoning models, like OpenAI O1 and DeepSeek R1, marks another leap forward, highlighting LLMs' ability to think progressively before delivering final answers. This breakthrough significantly improves the ability to address complex tasks, e.g., coding and math proofs. Inspired by this progress, we aim to develop similar capabilities for retrieval models, which hold great promise for tackling critical challenges in the field, including multi-task retrieval, zero-shot retrieval, and tasks requiring intensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling