Self-Describing Structured Data with Dual-Layer Guidance: A Lightweight Alternative to RAG for Precision Retrieval in Large-Scale LLM Knowledge Navigation
Hung Ming Liu

TL;DR
The paper introduces SDSR, a lightweight, structured data retrieval method that leverages human-authored metadata and dual-layer guidance to improve large language model navigation without heavy infrastructure.
Contribution
It proposes a novel self-describing structured data framework with dual-layer guidance, outperforming traditional retrieval methods in accuracy and simplicity.
Findings
Version D achieves 100% primary routing accuracy at 119 categories.
SDSR outperforms no-guidance baseline in category routing accuracy.
Explicit rules enable effective primary routing, but cross-category routing needs architectural design.
Abstract
Large Language Models (LLMs) exhibit a well-documented positional bias when processing long input contexts: information in the middle of a context window receives substantially less attention than content at the boundaries, a phenomenon termed the Lost-in-the-Middle effect (Liu et al., 2024). This limits knowledge-retrieval applications that embed large structured knowledge bases directly in the LLM context. Retrieval-Augmented Generation (RAG) addresses scalability by retrieving only relevant fragments, but introduces substantial infrastructure overhead and is ill-suited to libraries whose semantic boundaries are human-defined rather than statistically learned. We propose Self-Describing Structured Retrieval (SDSR), a lightweight framework in which structured data files embed human-authored navigational metadata at the file's primacy position, thereby exploiting rather than fighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
