TL;DR
This paper introduces TILA, a framework that uses temporal inversion as a supervisory signal to improve the sensitivity of vision-language models to changes over time in chest X-ray images.
Contribution
TILA is a novel approach that explicitly incorporates temporal order learning into vision-language models for chest radiographs, enhancing interval change detection.
Findings
TILA improves progression classification accuracy.
TILA enhances temporal embedding alignment.
TILA shows consistent gains across multiple architectures.
Abstract
Recent advances in vision--language pretraining have enabled strong medical foundation models, yet most analyze radiographs in isolation, overlooking the key clinical task of comparing prior and current images to assess interval change. For chest radiographs (CXRs), capturing interval change is essential, as radiologists must evaluate not only the static appearance of findings but also how they evolve over time. We introduce TILA (Temporal Inversion-aware Learning and Alignment), a simple yet effective framework that uses temporal inversion, reversing image pairs, as a supervisory signal to enhance the sensitivity of existing temporal vision-language models to directional change. TILA integrates inversion-aware objectives across pretraining, fine-tuning, and inference, complementing conventional appearance modeling with explicit learning of temporal order. We also propose a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
