Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Sunyuan Qiang, Xianfei Li, Yanyan Liang, Wenlong Liao, Tao He, Pai, Peng

TL;DR
This paper introduces Neighboring Region Attention Alignment (NRAA), a novel method that enhances open-vocabulary object detection by leveraging neighboring region relationships through attention mechanisms, improving alignment with vision-language models.
Contribution
The paper proposes NRAA, a new attention-based alignment method that considers neighboring regions to improve open-vocabulary object detection performance.
Findings
NRAA outperforms existing distillation-based methods on open-vocabulary benchmarks.
Incorporating neighboring region attention improves detection accuracy for novel classes.
Extensive experiments demonstrate the effectiveness of NRAA in real-world scenarios.
Abstract
The nature of diversity in real-world environments necessitates neural network models to expand from closed category settings to accommodate novel emerging categories. In this paper, we study the open-vocabulary object detection (OVD), which facilitates the detection of novel object classes under the supervision of only base annotations and open-vocabulary knowledge. However, we find that the inadequacy of neighboring relationships between regions during the alignment process inevitably constrains the performance on recent distillation-based OVD strategies. To this end, we propose Neighboring Region Attention Alignment (NRAA), which performs alignment within the attention mechanism of a set of neighboring regions to boost the open-vocabulary inference. Specifically, for a given proposal region, we randomly explore the neighboring boxes and conduct our proposed neighboring region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training · Balanced Selection
