Test-Time Adaptive Object Detection with Foundation Model
Yingjie Gao, Yanan Zhang, Zhi Cai, Di Huang

TL;DR
This paper introduces a foundation model-powered test-time adaptive object detection method that eliminates source data reliance and handles cross-domain and cross-category adaptation through a multi-modal prompt-based framework and dynamic memory modules.
Contribution
It proposes the first foundation model-based approach for test-time object detection that overcomes traditional limitations and enhances adaptation efficiency and quality.
Findings
Outperforms previous state-of-the-art methods on cross-dataset benchmarks.
Effectively adapts to arbitrary cross-domain and cross-category data.
Maintains high-quality pseudo-labels using dynamic memory modules.
Abstract
In recent years, test-time adaptive object detection has attracted increasing attention due to its unique advantages in online domain adaptation, which aligns more closely with real-world application scenarios. However, existing approaches heavily rely on source-derived statistical characteristics while making the strong assumption that the source and target domains share an identical category space. In this paper, we propose the first foundation model-powered test-time adaptive object detection method that eliminates the need for source data entirely and overcomes traditional closed-set limitations. Specifically, we design a Multi-modal Prompt-based Mean-Teacher framework for vision-language detector-driven test-time adaptation, which incorporates text and visual prompt tuning to adapt both language and vision representation spaces on the test data in a parameter-efficient manner.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
