Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation

Chun-Mei Feng; Yuanyang He; Jian Zou; Salman Khan; Huan; Xiong; Zhen Li; Wangmeng Zuo; Rick Siow Mong Goh; Yong Liu

arXiv:2412.09706·cs.CV·December 30, 2024

Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation

Chun-Mei Feng, Yuanyang He, Jian Zou, Salman Khan, Huan, Xiong, Zhen Li, Wangmeng Zuo, Rick Siow Mong Goh, Yong Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces IT3A, a multi-modal test-time adaptation method that uses generative models for data augmentation across text and images, significantly improving accuracy under distribution shifts.

Contribution

IT3A leverages pre-trained vision and language models for multi-modal augmentation and employs cosine similarity filtering, offering a novel approach to test-time adaptation beyond single-modality methods.

Findings

01

Outperforms state-of-the-art TPT methods by 5.50% in accuracy.

02

Effectively filters spurious augmentations using cosine similarity.

03

Enhances model robustness to distribution shifts and domain gaps.

Abstract

Existing test-time prompt tuning (TPT) methods focus on single-modality data, primarily enhancing images and using confidence ratings to filter out inaccurate images. However, while image generation models can produce visually diverse images, single-modality data enhancement techniques still fail to capture the comprehensive knowledge provided by different modalities. Additionally, we note that the performance of TPT-based methods drops significantly when the number of augmented images is limited, which is not unusual given the computational expense of generative augmentation. To address these issues, we introduce IT3A, a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains. By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chunmeifeng/difftpt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Advanced Vision and Imaging · Advanced Image Processing Techniques

MethodsFocus · Adapter