Model-Document Protocol for AI Search

Hongjin Qian; Zheng Liu

arXiv:2510.25160·cs.CL·October 31, 2025

Model-Document Protocol for AI Search

Hongjin Qian, Zheng Liu

PDF

TL;DR

This paper introduces the Model-Document Protocol (MDP), a new framework for transforming raw, unstructured documents into structured, task-specific knowledge representations to improve AI search with large language models.

Contribution

The paper proposes the MDP framework that formalizes multiple pathways for converting raw documents into LLM-ready inputs, including agentic reasoning, memory grounding, and structured leveraging, with an instantiation called MDP-Agent.

Findings

01

MDP-Agent outperforms baselines on information-seeking benchmarks.

02

The framework effectively bridges raw documents and LLM reasoning.

03

Structured knowledge representations enhance AI search capabilities.

Abstract

AI search depends on linking large language models (LLMs) with vast external knowledge sources. Yet web pages, PDF files, and other raw documents are not inherently LLM-ready: they are long, noisy, and unstructured. Conventional retrieval methods treat these documents as verbatim text and return raw passages, leaving the burden of fragment assembly and contextual reasoning to the LLM. This gap underscores the need for a new retrieval paradigm that redefines how models interact with documents. We introduce the Model-Document Protocol (MDP), a general framework that formalizes how raw text is bridged to LLMs through consumable knowledge representations. Rather than treating retrieval as passage fetching, MDP defines multiple pathways that transform unstructured documents into task-specific, LLM-ready inputs. These include agentic reasoning, which curates raw evidence into coherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.