FrEVL: Leveraging Frozen Pretrained Embeddings for Efficient Vision-Language Understanding

Emmanuelle Bourigault; Pauline Bourigault

arXiv:2508.04469·cs.CV·August 7, 2025

FrEVL: Leveraging Frozen Pretrained Embeddings for Efficient Vision-Language Understanding

Emmanuelle Bourigault, Pauline Bourigault

PDF

TL;DR

FrEVL demonstrates that using frozen pretrained embeddings can achieve near state-of-the-art vision-language understanding performance with significantly reduced computational cost, making it suitable for resource-constrained scenarios.

Contribution

The paper introduces FrEVL, a novel framework that leverages frozen pretrained embeddings for efficient vision-language tasks, reducing training complexity and energy consumption.

Findings

01

Achieves 85-95% of state-of-the-art performance with fewer trainable parameters.

02

Provides 2.3x speedup and 52% lower energy consumption compared to end-to-end training.

03

Effectiveness depends on alignment between pretraining objectives and downstream tasks.

Abstract

The deployment of vision-language models remains constrained by substantial computational requirements. We present \textbf{FrEVL}, a framework exploring whether frozen pretrained embeddings can support effective vision-language understanding. Our analysis reveals that frozen embeddings contain rich information for discriminative tasks, achieving 85\% to 95\% of state-of-the-art performance on standard benchmarks with only 68.4M trainable parameters. This performance dichotomy reveals a critical insight: frozen embedding effectiveness depends on alignment between pretraining objectives and downstream task requirements. When accounting for end-to-end computation including embedding extraction, FrEVL provides $2.3 \times$ speedup with 52\% lower energy consumption, making it suitable for scenarios with pre-computable inputs or when deployment constraints outweigh marginal performance gains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.