FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

Kaixiang Zhao; Mao Ye; Lihua Zhou; Hu Wang; Luping Ji; Song Tang; Xiatian Zhu

arXiv:2605.03294·cs.CV·May 6, 2026

FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

Kaixiang Zhao, Mao Ye, Lihua Zhou, Hu Wang, Luping Ji, Song Tang, Xiatian Zhu

PDF

TL;DR

FACTOR is a lightweight, training-free test-time adaptation method for open-vocabulary object detection that uses counterfactual reasoning to improve robustness against distribution shifts without online optimization.

Contribution

It introduces a novel counterfactual reasoning framework that perturbs non-causal attributes to enhance detection robustness without parameter updates.

Findings

01

Outperforms prior TTA methods on PASCAL-C, COCO-C, and FoggyCityscapes.

02

Effectively suppresses attribute-dependent predictions to handle distribution shifts.

03

Demonstrates that explicit counterfactual reasoning improves robustness.

Abstract

Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.