Knowledge-based learning in Text-RAG and Image-RAG

Alexander Shim; Khalil Saieh; Samuel Clarke

arXiv:2601.08226·cs.CV·January 14, 2026

Knowledge-based learning in Text-RAG and Image-RAG

Alexander Shim, Khalil Saieh, Samuel Clarke

PDF

Open Access

TL;DR

This paper compares multi-modal RAG approaches using Vision Transformer and LLMs like LLaMA and ChatGPT for chest X-ray disease detection, highlighting improvements in hallucination reduction and prediction confidence.

Contribution

It introduces a multi-modal RAG framework combining image and text models, demonstrating enhanced disease detection and reduced hallucination in medical imaging.

Findings

01

Text-based RAG reduces hallucinations using external knowledge.

02

Image-based RAG improves prediction confidence with KNN.

03

GPT LLM outperforms LLaMA in calibration and hallucination rate.

Abstract

This research analyzed and compared the multi-modal approach in the Vision Transformer(EVA-ViT) based image encoder with the LlaMA or ChatGPT LLM to reduce the hallucination problem and detect diseases in chest x-ray images. In this research, we utilized the NIH Chest X-ray image to train the model and compared it in image-based RAG, text-based RAG, and baseline. [3] [5] In a result, the text-based RAG[2] e!ectively reduces the hallucination problem by using external knowledge information, and the image-based RAG improved the prediction con"dence and calibration by using the KNN methods. [4] Moreover, the GPT LLM showed better performance, a low hallucination rate, and better Expected Calibration Error(ECE) than Llama Llama-based model. This research shows the challenge of data imbalance, a complex multi-stage structure, but suggests a large experience environment and a balanced example…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Medical Imaging and Analysis · Brain Tumor Detection and Classification