Feature Projection Learning for Better Vision-Language Reasoning

Yi Zhang; Weicheng Lin; Liang-Jie Zhang

arXiv:2601.20224·cs.CV·January 29, 2026

Feature Projection Learning for Better Vision-Language Reasoning

Yi Zhang, Weicheng Lin, Liang-Jie Zhang

PDF

Open Access

TL;DR

This paper introduces Feature Projection Learning (FPL), a simple and efficient method that enhances vision-language reasoning by projecting class features into image feature space, improving accuracy over existing methods.

Contribution

The paper proposes FPL, a novel projection-based approach that improves downstream task adaptation of CLIP with better performance and efficiency.

Findings

01

FPL surpasses state-of-the-art methods in accuracy.

02

FPL effectively transforms classification into a feature projection problem.

03

Empirical results demonstrate FPL's superior performance.

Abstract

Vision-Language Pre-Trained models, notably CLIP, that utilize contrastive learning have proven highly adept at extracting generalizable visual features. To inherit the well-learned knowledge of VLP models for downstream tasks, several approaches aim to adapt them efficiently with limited supervision. However, these methods either suffer from limited performance, excessive learnable parameters, or extended training times, all of which hinder their effectiveness in adapting the CLIP model to downstream tasks. In this work, we propose a simple yet efficient and effective method called \textit{\textbf{F}eature \textbf{P}rojection \textbf{L}earning(FPL)} to address these problems. Specifically, we develop a projection model that projects class prototype features into the query image feature space and reconstructs the query image feature map. The negative average squared reconstruction error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques