Teach-DETR: Better Training DETR with Teachers
Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu,, Hongsheng Li

TL;DR
Teach-DETR introduces a teacher-based training scheme for DETR detectors, leveraging multiple teacher models' predicted boxes to enhance detection accuracy without extra inference cost.
Contribution
The paper proposes a novel training approach that uses teacher detectors' predicted boxes to improve DETR-based models, compatible with multiple teachers and with minimal additional computational cost.
Findings
Improves DINO detector accuracy from 57.8% to 58.9% mAP.
Effective knowledge transfer from RCNN and DETR-based teachers.
No additional inference overhead during deployment.
Abstract
In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors. We show that the predicted boxes from teacher detectors are effective medium to transfer knowledge of teacher detectors, which could be either RCNN-based or DETR-based detectors, to train a more accurate and robust DETR model. This new training scheme can easily incorporate the predicted boxes from multiple teacher detectors, each of which provides parallel supervisions to the student DETR. Our strategy introduces no additional parameters and adds negligible computational cost to the original detector during training. During inference, Teach-DETR brings zero additional overhead and maintains the merit of requiring no non-maximum suppression. Extensive experiments show that our method leads to consistent improvement for various DETR-based detectors.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Vision Transformer · Layer Normalization · Adam · Absolute Position Encodings · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding
