Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

Xi Zhang; Zaiqiao Meng; Jake Lever; Edmond S. L. Ho

arXiv:2411.19378·cs.CV·August 5, 2025

Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

Xi Zhang, Zaiqiao Meng, Jake Lever, Edmond S. L. Ho

PDF

Open Access 1 Repo 5 Models 1 Datasets 3 Reviews

TL;DR

Libra is a novel multimodal large language model designed for chest X-ray report generation that effectively captures temporal differences between current and prior images, improving report accuracy.

Contribution

Libra introduces a temporal-aware architecture with a specialized Temporal Alignment Connector for enhanced medical image analysis.

Findings

01

Achieves state-of-the-art performance on MIMIC-CXR dataset.

02

Effectively captures temporal differences in chest X-ray images.

03

Improves clinical relevance and lexical accuracy in reports.

Abstract

Radiology report generation (RRG) requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. While multimodal large language models (MLLMs) align with pre-trained vision encoders to enhance visual-language understanding, most existing methods rely on single-image analysis or rule-based heuristics to process multiple images, failing to fully leverage temporal information in multi-modal medical datasets. In this paper, we introduce Libra, a temporal-aware MLLM tailored for chest X-ray report generation. Libra combines a radiology-specific image encoder with a novel Temporal Alignment Connector (TAC), designed to accurately capture and integrate temporal differences between paired current and prior images. Extensive experiments on the MIMIC-CXR dataset demonstrate that Libra establishes a new state-of-the-art benchmark among similarly scaled…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1- The clinical problem statement is fair and important 2- The evaluation is good and comprehensive and ablation studies showed how the method behaves in difference scenarios 3- The authors introduced an interesting technical local and global learning mechanism

Weaknesses

1- The main claim of the paper is that it is the first to introduce a VLM for automatic report generation that utilizes temporal scans to ensure more realistic reports that learn from multiple scans acquired at different time points. Regardless of whether it is a VLM or other types of encoder/decoder nets, this claim is not true because multiple works have been published to address this problem. For instance, -https://aclanthology.org/2023.findings-emnlp.325/ -https://aclanthology.org/2023.fin

Reviewer 02Rating 5Confidence 3

Strengths

* Innovative temporal processing. The TAC module is a novel addition that allows Libra to capture and utilize temporal changes in medical images effectively, enhancing the model's clinical applicability. * Comprehensive ablation studies. The ablation experiments clarify the importance of each submodule (TFM, LFE, and PIPB), reinforcing the credibility of the design choices. * Comprehensive appendix. The appendix is highly commendable, providing detailed descriptions of the datasets, training con

Weaknesses

Despite the paper's clarity, several imprecise arguments and overstatements necessitate revision and clarification: * Incomplete framework representation. TFM is a crucial component of the core TAC, but the framework diagram omits the illustration of the $MLP_{final}$ part within TFM. This omission may lead to ambiguity regarding the final processing steps, making it more difficult for readers to fully understand how all modules are integrated within the model. It is recommended that the authors

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper presents an innovative approach for handling prior study citations across various time points in report generation tasks. 2. The development of the Temporal Alignment Connector showcases a sophisticated method for capturing and integrating temporal information across multiple images. 3. A comprehensive experimental analysis, including ablation studies and qualitative comparisons, is provided to validate the effectiveness of the proposed methods.

Weaknesses

1. Comparative Results: The comparative results do not convincingly demonstrate Libra's superiority. Although MIMIC-Diff-VQA is derived from MIMIC-CXR, the comparison seems unbalanced, as Libra was trained on both MIMIC-CXR and MIMIC-Diff-VQA, while the other model was only trained on MIMIC-CXR. 2. Effectiveness of the Temporal Alignment Connector: The authors overstate the effectiveness of the Temporal Alignment Connector (e.g., "significant enhancements across all metrics" in line 398). While

Code & Models

Repositories

X-iZhang/Libra
pytorchOfficial

Models

Datasets

Yamini-1628/MIMIC-CXR-RRG
dataset· 132 dl
132 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection

MethodsFocus