FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection
Kaixiang Zhao, Mao Ye, Lihua Zhou, Hu Wang, Luping Ji, Song Tang, Xiatian Zhu

TL;DR
FACTOR is a lightweight, training-free test-time adaptation method for open-vocabulary object detection that uses counterfactual reasoning to improve robustness against distribution shifts without online optimization.
Contribution
It introduces a novel counterfactual reasoning framework that perturbs non-causal attributes to enhance detection robustness without parameter updates.
Findings
Outperforms prior TTA methods on PASCAL-C, COCO-C, and FoggyCityscapes.
Effectively suppresses attribute-dependent predictions to handle distribution shifts.
Demonstrates that explicit counterfactual reasoning improves robustness.
Abstract
Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
