CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting
Naman Sharma

TL;DR
This paper evaluates state-of-the-art vision-language models for chest X-ray interpretation, identifies hallucination issues, and develops an uncertainty-aware report generation approach that improves accuracy and interpretability.
Contribution
It introduces an agent-based vision-language method for radiology report generation that incorporates uncertainty estimation and localizes pathologies, advancing clinical applicability.
Findings
Vision-language models outperform traditional models on multiple datasets.
Hallucination with confident language is a significant challenge.
Uncertainty-aware reports improve interpretability and safety.
Abstract
Recently large vision-language models have shown potential when interpreting complex images and generating natural language descriptions using advanced reasoning. Medicine's inherently multimodal nature incorporating scans and text-based medical histories to write reports makes it conducive to benefit from these leaps in AI capabilities. We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation across several datasets and benchmarks. We use linear probes to evaluate the performance of various components including CheXagent's vision transformer and Q-former, which outperform the industry-standard Torch X-ray Vision models across many different datasets showing robust generalisation capabilities. Importantly, we find that vision-language models often hallucinate with confident language, which slows down clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Image Retrieval and Classification Techniques
MethodsAttention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer
