GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
Yujun Li, Hongyuan Zhang, Yuan Yuan

TL;DR
This paper introduces GRPO-TTA, a novel test-time adaptation method for vision-language models that uses group-wise policy optimization and reward functions to improve performance under distribution shifts.
Contribution
It reformulates class-specific prompt prediction as a group policy optimization problem and designs tailored reward functions for effective test-time visual encoder tuning.
Findings
GRPO-TTA outperforms existing TTA methods across multiple benchmarks.
It achieves significant gains under natural distribution shifts.
The approach enables probability-driven optimization without ground-truth labels.
Abstract
Group Relative Policy Optimization (GRPO) has recently shown strong performance in post-training large language models and vision-language models. It raises a question of whether the GRPO also significantly promotes the test-time adaptation (TTA) of vision language models. In this paper, we propose Group Relative Policy Optimization for Test-Time Adaptation (GRPO-TTA), which adapts GRPO to the TTA setting by reformulating class-specific prompt prediction as a group-wise policy optimization problem. Specifically, we construct output groups by sampling top-K class candidates from CLIP similarity distributions, enabling probability-driven optimization without access to ground-truth labels. Moreover, we design reward functions tailored to test-time adaptation, including alignment rewards and dispersion rewards, to guide effective visual encoder tuning. Extensive experiments across diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
