CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs
Garapati Keerthana, Manik Gupta

TL;DR
CLI-RAG is a domain-specific framework that enhances clinical text generation by combining hierarchical chunking and dual-stage retrieval, effectively handling unstructured, dense, and lengthy clinical notes with improved semantic alignment.
Contribution
It introduces a novel hierarchical chunking and dual-stage retrieval mechanism tailored for clinical notes, improving relevance and structure in LLM-based clinical text generation.
Findings
Achieves an average alignment score of 87.7%, surpassing baseline of 80.7%.
Maintains high consistency and semantic alignment across multiple visits.
Demonstrates effectiveness on MIMIC-III clinical notes dataset.
Abstract
Large language models (LLMs), including zero-shot and few-shot paradigms, have shown promising capabilities in clinical text generation. However, real-world applications face two key challenges: (1) patient data is highly unstructured, heterogeneous, and scattered across multiple note types and (2) clinical notes are often long and semantically dense, making naive prompting infeasible due to context length constraints and the risk of omitting clinically relevant information. We introduce CLI-RAG (Clinically Informed Retrieval-Augmented Generation), a domain-specific framework for structured and clinically grounded text generation using LLMs. It incorporates a novel hierarchical chunking strategy that respects clinical document structure and introduces a task-specific dual-stage retrieval mechanism. The global stage identifies relevant note types using evidence-based queries, while the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Multimodal Machine Learning Applications
