Improving Long-tailed Object Detection with Image-Level Supervision by Multi-Task Collaborative Learning
Bo Li, Yongqiang Yao, Jingru Tan, Xin Lu, Fengwei Yu, Ye Luo, Jianwei, Lu

TL;DR
This paper introduces CLIS, a multi-task collaborative framework leveraging image-level supervision to improve long-tailed object detection, especially for tail categories, achieving state-of-the-art results on LVIS dataset.
Contribution
Proposes a novel multi-task collaborative learning framework that effectively utilizes image-level supervision to enhance tail category detection in long-tailed datasets.
Findings
Achieves 31.1 AP on LVIS dataset, surpassing previous methods.
Improves tail category AP by 10.1 points.
Demonstrates effectiveness without complex loss engineering.
Abstract
Data in real-world object detection often exhibits the long-tailed distribution. Existing solutions tackle this problem by mitigating the competition between the head and tail categories. However, due to the scarcity of training samples, tail categories are still unable to learn discriminative representations. Bringing more data into the training may alleviate the problem, but collecting instance-level annotations is an excruciating task. In contrast, image-level annotations are easily accessible but not fully exploited. In this paper, we propose a novel framework CLIS (multi-task Collaborative Learning with Image-level Supervision), which leverage image-level supervision to enhance the detection ability in a multi-task collaborative way. Specifically, there are an object detection task (consisting of an instance-classification task and a localization task) and an image-classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
