Beyond Task-Driven Features for Object Detection
Meilun Zhou, Alina Zare

TL;DR
This paper proposes an annotation-guided feature augmentation method for object detection that improves focus, robustness, and generalization by aligning features with annotation structure.
Contribution
It introduces a novel framework that injects annotation-guided embeddings into detection backbones, enhancing transferability and interpretability over traditional task-driven features.
Findings
Improves object focus and reduces background sensitivity.
Enhances generalization to unseen or weakly supervised tasks.
Consistent performance gains across wildlife and remote sensing datasets.
Abstract
Task-driven features learned by modern object detectors optimize end task loss yet often capture shortcut correlations that fail to reflect underlying annotation structure. Such representations limit transfer, interpretability, and robustness when task definitions change or supervision becomes sparse. This paper introduces an annotation-guided feature augmentation framework that injects embeddings into an object detection backbone. The method constructs dense spatial feature grids from annotation-guided latent spaces and fuses them with feature pyramid representations to influence region proposal and detection heads. Experiments across wildlife and remote sensing datasets evaluate classification, localization, and data efficiency under multiple supervision regimes. Results show consistent improvements in object focus, reduced background sensitivity, and stronger generalization to unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
