Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

Lihua Zhou; Mao Ye; Xiatian Zhu; Nianxin Li; Changyi Ma; Shuaifeng Li; Yitong Qin; Hongbin Liu; Jiebo Luo; Zhen Lei

arXiv:2605.04531·cs.CV·May 7, 2026

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma, Shuaifeng Li, Yitong Qin, Hongbin Liu, Jiebo Luo, Zhen Lei

PDF

TL;DR

This paper introduces RGSE, a training-free, reward-guided semantic evolution method that refines text embeddings at test time to improve open-vocabulary object detection under distribution shifts.

Contribution

It proposes a novel test-time semantic alignment approach using evolutionary search, avoiding costly backpropagation and external memory reliance.

Findings

01

RGSE achieves state-of-the-art results on multiple detection benchmarks.

02

It refines text embeddings efficiently without backpropagation.

03

The method adds minimal computational overhead.

Abstract

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.