EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment
Cheng Shi, Sibei Yang

TL;DR
EdaDet introduces a dense alignment method that enhances open-vocabulary object detection by preserving local image semantics, leading to significant improvements in detecting novel categories without external data.
Contribution
The paper proposes Early Dense Alignment (EDA), a novel approach that improves generalization to novel categories by focusing on dense local semantics during training.
Findings
Improves +8.4% novel box AP50 on COCO
Achieves +3.9% rare mask AP on LVIS
Outperforms existing methods without external resources
Abstract
Vision-language models such as CLIP have boosted the performance of open-vocabulary object detection, where the detector is trained on base categories but required to detect novel categories. Existing methods leverage CLIP's strong zero-shot recognition ability to align object-level embeddings with textual embeddings of categories. However, we observe that using CLIP for object-level alignment results in overfitting to base categories, i.e., novel categories most similar to base categories have particularly poor performance as they are recognized as similar base categories. In this paper, we first identify that the loss of critical fine-grained local image semantics hinders existing methods from attaining strong base-to-novel generalization. Then, we propose Early Dense Alignment (EDA) to bridge the gap between generalizable local semantics and object-level prediction. In EDA, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training · ALIGN · Balanced Selection
