Re-examining Distillation For Continual Object Detection
Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh,, Eduardo P\'erez-Pellitero, Matthias De Lange, Tinne Tuytelaars

TL;DR
This paper analyzes why continual object detection models forget previously learned knowledge and proposes improvements to distillation techniques, especially for classification heads, to enhance learning across classes and domains.
Contribution
It identifies issues with teacher predictions in distillation for object detection and introduces adaptive Huber loss and prediction filtering to improve continual learning.
Findings
Improved distillation method enhances continual detection performance.
Effective in class-incremental and domain-incremental settings.
Addresses overconfidence in teacher predictions for better learning.
Abstract
Training models continually to detect and classify objects, from new classes and new domains, remains an open problem. In this work, we conduct a thorough analysis of why and how object detection models forget catastrophically. We focus on distillation-based approaches in two-stage networks; the most-common strategy employed in contemporary continual object detection work.Distillation aims to transfer the knowledge of a model trained on previous tasks -- the teacher -- to a new model -- the student -- while it learns the new task. We show that this works well for the region proposal network, but that wrong, yet overly confident teacher predictions prevent student models from effective learning of the classification head. Our analysis provides a foundation that allows us to propose improvements for existing techniques by detecting incorrect teacher predictions, based on current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsHuber loss
