Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun; Pengfei Wei; Lawrence B. Hsieh

arXiv:2604.14572·cs.IR·May 18, 2026

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

PDF

1 Repo

TL;DR

Corpus2Skill enables LLM agents to navigate structured enterprise knowledge hierarchically, improving answer quality and grounding by organizing documents into a navigable skill directory, with scope-dependent effectiveness.

Contribution

This work introduces Corpus2Skill, a method to distill document corpora into hierarchical skill directories for improved navigation and knowledge grounding in LLM-based QA systems.

Findings

01

Navigation improves answer quality on enterprise support benchmarks.

02

Corpus2Skill outperforms various retrieval baselines in answer grounding.

03

Effectiveness depends on corpus structure and topical taxonomy.

Abstract

Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results, with no view of how the corpus is organized or what it has not yet seen. We present Corpus2Skill, which distills a document corpus offline into a hierarchical skill directory and lets an LLM agent navigate it at serve time, drilling from a bird's-eye view through progressively finer summaries down to documents, and backtracking when a branch is unproductive. On an enterprise customer-support benchmark, Corpus2Skill improves both answer quality and grounding over single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines at a moderate cost tradeoff. A ten-subset generalization study further shows that corpus navigation is not a universal replacement for retrieval: it consistently helps on single-domain corpora with a recoverable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dukesun99/Corpus2Skill
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.