Universal Object Detection with Large Vision Model
Feng Lin, Wenze Hu, Yaowei Wang, Yonghong Tian, Guangming Lu, Fanglin, Chen, Yong Xu, Xiaoyu Wang

TL;DR
This paper presents a universal object detection approach using a large pre-trained vision model, addressing multi-domain challenges and achieving high performance in a large-scale benchmark, advancing the goal of a general-purpose vision system.
Contribution
Introduces a hierarchy-aware loss and resource-efficient training method for universal object detection with a large vision model, handling cross-dataset label conflicts and hierarchies.
Findings
Secured second place in RVC 2022 object detection track
Demonstrated effectiveness on a million-scale cross-dataset benchmark
Provided open-source code for reproducibility
Abstract
Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such systems have the potential to address a wide range of vision tasks simultaneously, without being limited to specific problems or data domains. This universality is crucial for practical, real-world computer vision applications. In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system. This problem presents several intricate challenges, including cross-dataset category label duplication, label conflicts, and the necessity to handle hierarchical taxonomies. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training utilizing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Machine Learning and Data Classification
