LoFi: Location-Aware Fine-Grained Representation Learning for Chest X-ray

Myeongkyun Kang; Yanting Yang; Xiaoxiao Li

arXiv:2603.19451·cs.CV·April 28, 2026

LoFi: Location-Aware Fine-Grained Representation Learning for Chest X-ray

Myeongkyun Kang, Yanting Yang, Xiaoxiao Li

PDF

TL;DR

LoFi introduces a location-aware, fine-grained representation learning approach for chest X-ray analysis, improving retrieval and grounding by leveraging region-level supervision and dense captioning.

Contribution

The paper presents a novel joint optimization framework with location-aware captioning losses and integrates a fine-grained encoder for enhanced chest X-ray grounding.

Findings

01

Achieves superior retrieval performance on MIMIC-CXR and PadChest-GR.

02

Effectively incorporates region-level supervision through grounding and dense captioning.

03

Enhances fine-grained representation learning in chest X-ray analysis.

Abstract

Fine-grained representation learning is crucial for retrieval and phrase grounding in chest X-rays, where clinically relevant findings are often spatially confined. However, the lack of region-level supervision in contrastive models and the limited ability of large vision language models to capture fine-grained representations in external validation lead to suboptimal performance on these tasks. To address these limitations, we propose Location-aware Fine-grained representation learning (LoFi), which jointly optimizes sigmoid, captioning, and location-aware captioning losses using a lightweight large language model. The location-aware captioning loss enables region-level supervision through grounding and dense captioning objectives, thereby facilitating fine-grained representation learning. Building upon these representations, we integrate a fine-grained encoder into retrieval-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.