LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman

TL;DR
LaB-RAG introduces a novel, small-model-based approach for radiology report generation that leverages categorical labels and retrieval-augmented generation with large language models, avoiding extensive fine-tuning.
Contribution
It demonstrates that simple classification and zero-shot embeddings can effectively transform X-ray images into text labels for improved report generation without task-specific training.
Findings
Outperforms other retrieval-based methods in radiology report metrics
Achieves competitive results with fine-tuned vision-language models
Shows broad compatibility with existing fine-tuning approaches
Abstract
In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that fine-tuning of large, bespoke models is required to improve model generation accuracy. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a small-model-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG) over MIMIC-CXR and CheXpert Plus. We argue that simple classification models combined with zero-shot embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
While the topic is potentially relevant, the submission does not present significant strengths in methodology, novelty, or experimental validation.
1. the methodology is quite naive and lacks novelty. it mainly combines the classifier’s output and lets the llm rewrite it, without introducing any new mechanism or insight. 2. the baselines are incomplete. since the method is related to rag and involves an image classifier with a frozen llm, the authors should compare with other prior works such as radar (arxiv:2505.14318) and v-rag (arxiv:2502.15040), as well as other similar approaches. 3. the metric selection is not appropriate. the image
1. The paper introduces LaB-RAG, an improved version of prior RAG-based models that achieves performance comparable to fine-tuned models and outperforms all existing RAG-based approaches on the MIMIC-CXR and CheXpert-Plus datasets. 2. The proposed training methodology presents a novel way to leverage pre-trained image encoders and language generation models, supplemented by a lightweight machine learning component (logistic regression). This design enables efficient adaptation to the report gene
1. While Table 2 provides a single example comparing LaB-RAG and CXRMate, the paper lacks a broader analysis of failure modes. Suggestion: Include additional qualitative examples of incorrect generations and categorize common error types. 2. The paper relies entirely on automated evaluation metrics. Suggestion: Incorporate a radiologist or domain expert review of a representative sample of generated reports to assess clinical safety and practical utility. 3. The paper claims that LaB-RAG is ligh
1. Novel yet practical conceptual innovation: The paper presents a novel creative leap of categorical label prediction and RAG, flavoring a negligible-cost classical ML input with modern LLM output generation. The idea moves away from the current trend of full tuning or parameter-efficient tuning of large models, suggesting a new flexible alternative which is computationally inexpensive. 2. Wide empirical evaluation: The performance of LaB-RAG is evaluated on two large datasets (MIMIC-CXR and Ch
1. Limited conceptual depth beyond engineering style content: While it is clear that the engineering is done well and thoroughly, the real novelty (label augmented retrieval) is really just an incremental extension of known paradigms (RAG + structured auxiliary features). This contribution can be described as limited at least for ICLR, being of largely empirical rigor rather than of theoretical rigor. 2. Dependence upon labeler quality and domain heuristics: The performance is reliant on pre-ext
• The proposed pipeline eliminates the need for fine-tuning large generative models, which reduces computational cost and makes the approach more practical in resource-limited medical settings. • The framework shows competitive results in small-scale scenarios where training data or GPU resources are limited, suggesting utility as a lightweight alternative to fully fine-tuned VLMs. • The role separation between the image encoder and LLM provides a clear and modular design that could be extended
• The novelty of the proposed approach is limited. Retrieving similar reports as textual references and guiding generation through label-based filtering resembles a technical enhancement on top of existing retrieval-augmented strategies rather than a fundamentally new report generation paradigm. • The performance gains are modest and mainly shown against older baselines. As also reflected in the appendix, the improvements do not provide strong evidence that the method can compete with current st
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Weight Decay · Softmax
