AnyDepth-DETR/-YOLO: Any-depth object detection with a single network
Woochul Kang, Hyungseop Lee, Jiho Lee

TL;DR
This paper introduces an adaptable object detection network that can operate at various depths for different accuracy-efficiency needs, using a single trained model without retraining.
Contribution
It proposes a novel any-depth detection framework with a dual-path backbone, enabling continuous accuracy-efficiency trade-offs within one network.
Findings
Full-depth models match or surpass SOTA baselines.
Efficient configurations achieve up to 1.82x speedup with minimal AP loss.
Single trained model supports multiple deployment scenarios.
Abstract
Modern object detectors are static, fixed-depth networks optimized for a single operating point, requiring separate models for different deployment scenarios. We present an any-depth detection framework that enables a single network to span a continuous range of accuracy--efficiency trade-offs by controlling depth at inference time without retraining. Each backbone and neck stage is divided into an essential path, which always executes, and a skippable refinement path; this decomposition preserves the full multi-scale feature hierarchy at every depth configuration, unlike conventional early exiting that discards entire stages. To train such a network, jointly optimizing many sub-networks of varying depth introduces conflicting gradient signals. We address this via self-distillation between only the two extremes, with prediction-level and feature-level alignment losses that enforce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
