Test-Time Adaptive Object Detection with Foundation Model

Yingjie Gao; Yanan Zhang; Zhi Cai; Di Huang

arXiv:2510.25175·cs.CV·October 30, 2025

Test-Time Adaptive Object Detection with Foundation Model

Yingjie Gao, Yanan Zhang, Zhi Cai, Di Huang

PDF

TL;DR

This paper introduces a foundation model-powered test-time adaptive object detection method that eliminates source data reliance and handles cross-domain and cross-category adaptation through a multi-modal prompt-based framework and dynamic memory modules.

Contribution

It proposes the first foundation model-based approach for test-time object detection that overcomes traditional limitations and enhances adaptation efficiency and quality.

Findings

01

Outperforms previous state-of-the-art methods on cross-dataset benchmarks.

02

Effectively adapts to arbitrary cross-domain and cross-category data.

03

Maintains high-quality pseudo-labels using dynamic memory modules.

Abstract

In recent years, test-time adaptive object detection has attracted increasing attention due to its unique advantages in online domain adaptation, which aligns more closely with real-world application scenarios. However, existing approaches heavily rely on source-derived statistical characteristics while making the strong assumption that the source and target domains share an identical category space. In this paper, we propose the first foundation model-powered test-time adaptive object detection method that eliminates the need for source data entirely and overcomes traditional closed-set limitations. Specifically, we design a Multi-modal Prompt-based Mean-Teacher framework for vision-language detector-driven test-time adaptation, which incorporates text and visual prompt tuning to adapt both language and vision representation spaces on the test data in a parameter-efficient manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.