Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference

Wenxuan Xie; Yujia Wang; Xin Tan; Chaochao Lu; Xia Hu; Xuhong Wang

arXiv:2602.10021·cs.CL·February 11, 2026

Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference

Wenxuan Xie, Yujia Wang, Xin Tan, Chaochao Lu, Xia Hu, Xuhong Wang

PDF

Open Access 2 Datasets

TL;DR

DRIFT introduces a dual-model framework that decouples knowledge extraction from reasoning in LLMs, enabling more efficient long-context inference by dynamically compressing document chunks into implicit fact tokens.

Contribution

The paper presents DRIFT, a novel dual-model architecture that explicitly separates knowledge extraction from reasoning, improving long-context inference in LLMs.

Findings

01

Significantly outperforms baselines on long-context tasks.

02

Enhances the effective context window and reasoning capabilities.

03

Provides a scalable, efficient inference paradigm.

Abstract

The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques