ORION: ORthonormal Text Encoding for Universal VLM AdaptatION
Omprakash Chakraborty, Jose Dolz, Ismail Ben Ayed

TL;DR
ORION is a fine-tuning framework for text encoders that enhances vision-language models by promoting orthogonality among class representations, leading to improved task discriminability across multiple benchmarks.
Contribution
It introduces a novel orthogonality-based loss for fine-tuning text encoders using only class names, improving the quality of textual prototypes for VLMs.
Findings
Consistently improves performance across 11 benchmarks.
Enhances various VLM backbones in zero-shot, few-shot, and test-time adaptation.
Provides a probabilistic interpretation of the orthogonality penalty.
Abstract
Vision language models (VLMs) have demonstrated remarkable generalization across diverse tasks, yet their performance remains constrained by the quality and geometry of the textual prototypes used to represent classes. Standard zero shot classifiers, derived from frozen text encoders and handcrafted prompts, may yield correlated or weakly separated embeddings that limit task specific discriminability. We introduce ORION, a text encoder fine tuning framework that improves pretrained VLMs using only class names. Our method optimizes, via low rank adaptation, a novel loss integrating two terms, one promoting pairwise orthogonality between the textual representations of the classes of a given task and the other penalizing deviations from the initial class prototypes. Furthermore, we provide a probabilistic interpretation of our orthogonality penalty, connecting it to the general maximum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
