Modulating Localization and Classification for Harmonized Object Detection
Taiheng Zhang, Qiaoyong Zhong, Shiliang Pu, Di Xie

TL;DR
This paper introduces a mutual learning framework that modulates localization and classification tasks in object detection, reducing divergence and improving performance on COCO dataset.
Contribution
It proposes a novel mutual learning strategy with a labeling scheme and IoU rescoring to harmonize localization and classification in CNN-based detectors.
Findings
Significant performance improvements on COCO dataset
Effective reduction of localization-classification divergence
Generalizable approach for existing detectors
Abstract
Object detection involves two sub-tasks, i.e. localizing objects in an image and classifying them into various categories. For existing CNN-based detectors, we notice the widespread divergence between localization and classification, which leads to degradation in performance. In this work, we propose a mutual learning framework to modulate the two tasks. In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy. Besides, we introduce a simple yet effective IoU rescoring scheme, which further reduces the divergence. Moreover, we define a Spearman rank correlation-based metric to quantify the divergence, which correlates well with the detection performance. The proposed approach is general-purpose and can be easily injected into existing detectors such as FCOS and RetinaNet. We achieve a significant performance gain over the baseline detectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
Methods1x1 Convolution · Convolution · Non Maximum Suppression · FCOS · Focal Loss · Feature Pyramid Network · RetinaNet
