KohakuRAG: A simple RAG framework with hierarchical document indexing
Shih-Ying Yeh, Yueh-Feng Ku, Ko-Wei Huang, Buu-Khang Tu

TL;DR
KohakuRAG introduces a hierarchical document indexing framework with improved retrieval, stability, and citation accuracy for question-answering systems, achieving top results on the WattBot 2025 Challenge.
Contribution
It presents a novel four-level tree document structure, an LLM-powered query planner, and ensemble inference techniques for enhanced RAG performance.
Findings
Achieved first place on WattBot 2025 Challenge leaderboard.
Hierarchical dense retrieval matches hybrid approaches in effectiveness.
Prompt ordering and ensemble voting significantly improve accuracy.
Abstract
Retrieval-augmented generation (RAG) systems that answer questions from document collections face compounding difficulties when high-precision citations are required: flat chunking strategies sacrifice document structure, single-query formulations miss relevant passages through vocabulary mismatch, and single-pass inference produces stochastic answers that vary in both content and citation selection. We present KohakuRAG, a hierarchical RAG framework that preserves document structure through a four-level tree representation (document section paragraph sentence) with bottom-up embedding aggregation, improves retrieval coverage through an LLM-powered query planner with cross-query reranking, and stabilizes answers through ensemble inference with abstention-aware voting. We evaluate on the WattBot 2025 Challenge, a benchmark requiring systems to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Biomedical Text Mining and Ontologies
