Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection

Wenxuan Bao; Yanjun Zhao; Xiyuan Yang; Jingrui He

arXiv:2604.21728·cs.CV·April 28, 2026

Ramen: Robust Test-Time Adaptation of Vision-Language Models with Active Sample Selection

Wenxuan Bao, Yanjun Zhao, Xiyuan Yang, Jingrui He

PDF

1 Repo

TL;DR

Ramen is a framework that enhances the robustness of vision-language models during test-time by actively selecting relevant samples for adaptation, especially effective under mixed-domain shifts.

Contribution

It introduces an active sample selection method with an embedding-gradient cache for efficient, robust test-time adaptation in mixed-domain scenarios.

Findings

01

Ramen outperforms existing methods on multiple benchmarks.

02

It maintains strong performance under mixed-domain test data.

03

The embedding-gradient cache improves adaptation efficiency.

Abstract

Pretrained vision-language models such as CLIP exhibit strong zero-shot generalization but remain sensitive to distribution shifts. Test-time adaptation adapts models during inference without access to source data or target labels, offering a practical way to handle such shifts. However, existing methods typically assume that test samples come from a single, consistent domain, while in practice, test data often include samples from mixed domains with distinct characteristics. Consequently, their performance degrades under mixed-domain settings. To address this, we present Ramen, a framework for robust test-time adaptation through active sample selection. For each incoming test sample, Ramen retrieves a customized batch of relevant samples from previously seen data based on two criteria: domain consistency, which ensures that adaptation focuses on data from similar domains, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baowenxuan/Ramen
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.