CXR-Agent: Vision-language models for chest X-ray interpretation with   uncertainty aware radiology reporting

Naman Sharma

arXiv:2407.08811·eess.IV·July 15, 2024·1 cites

CXR-Agent: Vision-language models for chest X-ray interpretation with uncertainty aware radiology reporting

Naman Sharma

PDF

Open Access

TL;DR

This paper evaluates state-of-the-art vision-language models for chest X-ray interpretation, identifies hallucination issues, and develops an uncertainty-aware report generation approach that improves accuracy and interpretability.

Contribution

It introduces an agent-based vision-language method for radiology report generation that incorporates uncertainty estimation and localizes pathologies, advancing clinical applicability.

Findings

01

Vision-language models outperform traditional models on multiple datasets.

02

Hallucination with confident language is a significant challenge.

03

Uncertainty-aware reports improve interpretability and safety.

Abstract

Recently large vision-language models have shown potential when interpreting complex images and generating natural language descriptions using advanced reasoning. Medicine's inherently multimodal nature incorporating scans and text-based medical histories to write reports makes it conducive to benefit from these leaps in AI capabilities. We evaluate the publicly available, state of the art, foundational vision-language models for chest X-ray interpretation across several datasets and benchmarks. We use linear probes to evaluate the performance of various components including CheXagent's vision transformer and Q-former, which outperform the industry-standard Torch X-ray Vision models across many different datasets showing robust generalisation capabilities. Importantly, we find that vision-language models often hallucinate with confident language, which slows down clinical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Image Retrieval and Classification Techniques

MethodsAttention Is All You Need · Softmax · Residual Connection · Layer Normalization · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer