Model-Document Protocol for AI Search
Hongjin Qian, Zheng Liu

TL;DR
This paper introduces the Model-Document Protocol (MDP), a new framework for transforming raw, unstructured documents into structured, task-specific knowledge representations to improve AI search with large language models.
Contribution
The paper proposes the MDP framework that formalizes multiple pathways for converting raw documents into LLM-ready inputs, including agentic reasoning, memory grounding, and structured leveraging, with an instantiation called MDP-Agent.
Findings
MDP-Agent outperforms baselines on information-seeking benchmarks.
The framework effectively bridges raw documents and LLM reasoning.
Structured knowledge representations enhance AI search capabilities.
Abstract
AI search depends on linking large language models (LLMs) with vast external knowledge sources. Yet web pages, PDF files, and other raw documents are not inherently LLM-ready: they are long, noisy, and unstructured. Conventional retrieval methods treat these documents as verbatim text and return raw passages, leaving the burden of fragment assembly and contextual reasoning to the LLM. This gap underscores the need for a new retrieval paradigm that redefines how models interact with documents. We introduce the Model-Document Protocol (MDP), a general framework that formalizes how raw text is bridged to LLMs through consumable knowledge representations. Rather than treating retrieval as passage fetching, MDP defines multiple pathways that transform unstructured documents into task-specific, LLM-ready inputs. These include agentic reasoning, which curates raw evidence into coherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
