MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG

Bhavik Mangla

arXiv:2603.23533·cs.CL·March 30, 2026

MDKeyChunker: Single-Call LLM Enrichment with Rolling Keys and Key-Based Restructuring for High-Accuracy RAG

Bhavik Mangla

PDF

TL;DR

MDKeyChunker is a pipeline that performs structure-aware chunking, single-call metadata enrichment, and semantic-based restructuring of Markdown documents for high-accuracy retrieval in RAG systems.

Contribution

It introduces a novel three-stage pipeline that enables single-call metadata extraction and structure-aware document restructuring for improved RAG performance.

Findings

01

Achieves perfect recall@5 with BM25 over structural chunks.

02

Single-call extraction of seven metadata fields reduces LLM calls.

03

Dense retrieval pipeline attains high MRR of 0.911 on Markdown corpus.

Abstract

RAG pipelines typically rely on fixed-size chunking, which ignores document structure, fragments semantic units across boundaries, and requires multiple LLM calls per chunk for metadata extraction. We present MDKeyChunker, a three-stage pipeline for Markdown documents that (1) performs structure-aware chunking treating headers, code blocks, tables, and lists as atomic units; (2) enriches each chunk via a single LLM call extracting title, summary, keywords, typed entities, hypothetical questions, and a semantic key, while propagating a rolling key dictionary to maintain document-level context; and (3) restructures chunks by merging those sharing the same semantic key via bin-packing, co-locating related content for retrieval. The single-call design extracts all seven metadata fields in one LLM invocation, eliminating the need for separate per-field extraction passes. Rolling key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.