Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha; Logan Lawrence; Grant Van Horn; Subhransu Maji

arXiv:2501.06031·cs.CV·October 15, 2025

Generate, Transduct, Adapt: Iterative Transduction with VLMs

Oindrila Saha, Logan Lawrence, Grant Van Horn, Subhransu Maji

PDF

Open Access

TL;DR

This paper introduces GTA-CLIP, an iterative transductive method that leverages language models to improve zero-shot and few-shot classification accuracy with vision-language models like CLIP.

Contribution

GTA-CLIP is a novel iterative approach that incorporates language supervision into transductive zero-shot learning, enhancing performance over existing methods.

Findings

01

Achieves an average of 8.6% and 3.7% improvement across datasets.

02

Demonstrates effectiveness in both zero-shot and few-shot settings.

03

Ablation studies confirm the importance of each iterative step.

Abstract

Transductive zero-shot learning with vision-language models leverages image-image similarities within the dataset to achieve better classification accuracy compared to the inductive setting. However, there is little work that explores the structure of the language space in this context. We propose GTA-CLIP, a novel technique that incorporates supervision from language models for joint transduction in language and vision spaces. Our approach is iterative and consists of three steps: (i) incrementally exploring the attribute space by querying language models, (ii) an attribute-augmented transductive inference procedure, and (iii) fine-tuning the language and vision encoders based on inferred labels within the dataset. Through experiments with CLIP encoders, we demonstrate that GTA-CLIP, yields an average performance improvement of 8.6% and 3.7% across 12 datasets and 3 encoders, over CLIP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Neural Networks and Applications

MethodsTransductive Inference · Contrastive Language-Image Pre-training