Verbalized Representation Learning for Interpretable Few-Shot Generalization

Cheng-Fu Yang; Da Yin; Wenbo Hu; Heng Ji; Nanyun Peng; Bolei Zhou; Kai-Wei Chang

arXiv:2411.18651·cs.CV·August 8, 2025

Verbalized Representation Learning for Interpretable Few-Shot Generalization

Cheng-Fu Yang, Da Yin, Wenbo Hu, Heng Ji, Nanyun Peng, Bolei Zhou, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Verbalized Representation Learning (VRL), a method that automatically extracts human-interpretable features from few-shot data using vision-language models, significantly improving low-data object recognition.

Contribution

VRL is a novel approach that captures verbalized features for interpretability and enhances few-shot generalization, outperforming prior methods with less data.

Findings

01

24% absolute improvement over state-of-the-art

02

Uses 95% less data than previous methods

03

Features outperform human-labeled attributes by 20%

Abstract

Humans recognize objects after observing only a few examples, a remarkable capability enabled by their inherent language understanding of the real-world environment. Developing verbalized and interpretable representation can significantly improve model generalization in low-data settings. In this work, we propose Verbalized Representation Learning (VRL), a novel approach for automatically extracting human-interpretable features for object recognition using few-shot data. Our method uniquely captures inter-class differences and intra-class commonalities in the form of natural language by employing a Vision-Language Model (VLM) to identify key discriminative features between different classes and shared characteristics within the same class. These verbalized features are then mapped to numeric vectors through the VLM. The resulting feature vectors can be further utilized to train and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joeyy5588/vrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis