Heuristic-inspired Reasoning Priors Facilitate Data-Efficient Referring Object Detection
Xu Zhang, Zhe Chen, Jing Zhang, Dacheng Tao

TL;DR
This paper introduces HeROD, a framework that incorporates heuristic-inspired reasoning priors into object detection models, significantly improving data efficiency and performance in low-data referring object detection tasks.
Contribution
It proposes a novel, interpretable method to embed spatial and semantic priors into a modern detection pipeline, enhancing data efficiency in scarce-label scenarios.
Findings
HeROD outperforms baseline models in low-data regimes on RefCOCO datasets.
Incorporating reasoning priors improves convergence speed and label efficiency.
The approach is model-agnostic and applicable to various vision-language tasks.
Abstract
Most referring object detection (ROD) models, especially the modern grounding detectors, are designed for data-rich conditions, yet many practical deployments, such as robotics, augmented reality, and other specialized domains, would face severe label scarcity. In such regimes, end-to-end grounding detectors need to learn spatial and semantic structure from scratch, wasting precious samples. We ask a simple question: Can explicit reasoning priors help models learn more efficiently when data is scarce? To explore this, we first introduce a Data-efficient Referring Object Detection (De-ROD) task, which is a benchmark protocol for measuring ROD performance in low-data and few-shot settings. We then propose the HeROD (Heuristic-inspired ROD), a lightweight, model-agnostic framework that injects explicit, heuristic-inspired spatial and semantic reasoning priors, which are interpretable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling
