Learning to Name Classes for Vision and Language Models

Sarah Parisot; Yongxin Yang; Steven McDonagh

arXiv:2304.01830·cs.CV·April 5, 2023·1 cites

Learning to Name Classes for Vision and Language Models

Sarah Parisot, Yongxin Yang, Steven McDonagh

PDF

Open Access

TL;DR

This paper introduces a method to learn optimal class-specific word embeddings from visual data, improving zero-shot recognition, dataset adaptation, and handling ambiguous class names in vision-language models.

Contribution

It proposes a novel approach to adapt class names by learning word embeddings from visual content, enhancing model flexibility and performance.

Findings

01

Significant performance improvements in image classification and object detection

02

Effective adaptation to new datasets with minimal fine-tuning

03

Ability to correct or refine class names based on learned embeddings

Abstract

Large scale vision and language models can achieve impressive zero-shot recognition performance by mapping class specific text queries to image content. Two distinct challenges that remain however, are high sensitivity to the choice of handcrafted class names that define queries, and the difficulty of adaptation to new, smaller datasets. Towards addressing these problems, we propose to leverage available data to learn, for each class, an optimal word embedding as a function of the visual content. By learning new word embeddings on an otherwise frozen model, we are able to retain zero-shot capabilities for new classes, easily adapt models to new datasets, and adjust potentially erroneous, non-descriptive or ambiguous class names. We show that our solution can easily be integrated in image classification and object detection pipelines, yields significant performance gains in multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI