On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

David Restrepo; Miguel L Martins; Chenwei Wu; Luis Filipe Nakayama; Diego M Lopez; Stergios Christodoulidis; Maria Vakalopoulou; Enzo Ferrante

arXiv:2603.17246·cs.LG·March 23, 2026

On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

David Restrepo, Miguel L Martins, Chenwei Wu, Luis Filipe Nakayama, Diego M Lopez, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante

PDF

Open Access

TL;DR

This paper investigates the 'cone effect' and modality gap in medical vision-language models, introducing a simple control mechanism to optimize cross-modal separation and improve downstream task performance without retraining.

Contribution

It presents a lightweight post-hoc method to control the modality gap in pretrained VLMs, enabling systematic analysis of its impact on medical multimodal tasks.

Findings

01

Reducing the modality gap improves downstream performance.

02

Medical datasets are more sensitive to gap modulation.

03

Complete gap collapse is not always optimal.

Abstract

Vision-Language Models (VLMs) exhibit a characteristic "cone effect" in which nonlinear encoders map embeddings into highly concentrated regions of the representation space, contributing to cross-modal separation known as the modality gap. While this phenomenon has been widely observed, its practical impact on supervised multimodal learning -- particularly in medical domains -- remains unclear. In this work, we introduce a lightweight post-hoc mechanism that keeps pretrained VLM encoders frozen while continuously controlling cross-modal separation through a single hyperparameter {{\lambda}}. This enables systematic analysis of how the modality gap affects downstream multimodal performance without expensive retraining. We evaluate generalist (CLIP, SigLIP) and medically specialized (BioMedCLIP, MedSigLIP) models across diverse medical and natural datasets in a supervised multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI