Visual Accommodation: Rethinking Image Scale as a Learnable Variable for Object Detection
Daeun Seo, Hoeseok Yang, Sihyeong Park, Hyungshin Kim

TL;DR
This paper introduces Ciliary-DETR, a novel framework that dynamically adjusts input image scales during inference, inspired by biological accommodation, to improve object detection robustness and flexibility.
Contribution
It presents a learnable scale predictor for test-time resolution adjustment, addressing the unobservable nature of optimal scales through a parametric and loss-driven approach.
Findings
Enables flexible single-pass inference with dynamic scale adjustment.
Improves robustness of object detection across varying input scales.
Bridges the gap between training robustness and test-time adaptation.
Abstract
We propose Ciliary-DETR (previous name: Elastic-DETR), a framework for test-time resolution adjustment analogous to biological accommodation. While multi-scale data augmentation improves robustness to scale variation, modern detectors rely on fixed inference resolutions, potentially limiting flexibility and robustness. Similar to the ciliary muscle, we introduce a lightweight scale predictor that dynamically estimates test-time scale factors across a wide range of input scales. The core challenge is that the optimal input scale is inherently unobservable under standard training setups. To address this challenge, we introduce a parametric formulation of desired scaling behavior, leading to loss-driven objectives that guide scale optimization. Overall, our method enables flexible and efficient single-pass inference, bridging the gap between training-time robustness and test-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
