FREE: Fast and Robust Vision Language Models with Early Exits

Divya Jyoti Bajpai; Manjesh Kumar Hanawal

arXiv:2506.06884·cs.LG·June 10, 2025

FREE: Fast and Robust Vision Language Models with Early Exits

Divya Jyoti Bajpai, Manjesh Kumar Hanawal

PDF

Open Access

TL;DR

This paper introduces FREE, an adversarial training method with early exit strategies for vision-language models, significantly speeding up inference while maintaining accuracy and robustness.

Contribution

We propose a novel adversarial training framework for early exit strategies in VLMs, improving inference speed and robustness with minimal performance loss.

Findings

01

Speeds up inference by over 1.51x

02

Enhances model robustness against overthinking

03

Maintains comparable accuracy with faster inference

Abstract

In recent years, Vision-Language Models (VLMs) have shown remarkable performance improvements in Vision-Language tasks. However, their large size poses challenges for real-world applications where inference latency is a concern. To tackle this issue, we propose employing Early Exit (EE) strategies in VLMs. However, training exit classifiers in VLMs is challenging, particularly with limited labeled training data. To address this, we introduce FREE, an adversarial training approach within a GAN-based framework. Here, each exit consists of a transformer layer and a classifier. The transformer layer is adversarially trained to produce feature representations similar to the final layer, while a feature classifier serves as the discriminator. Our method focuses on performing input-adaptive inference that increases inference speed with minimal drop in performance. Experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications