On-Device Domain Generalization
Kaiyang Zhou, Yuanhan Zhang, Yuhang Zang, Jingkang Yang, Chen Change, Loy, Ziwei Liu

TL;DR
This paper investigates domain generalization for tiny neural networks on devices, finding that knowledge distillation and the proposed out-of-distribution knowledge distillation method significantly improve performance without increasing model size.
Contribution
It introduces OKD, a novel distillation approach using synthesized out-of-distribution data, tailored for tiny networks in on-device domain generalization tasks.
Findings
KD outperforms traditional DG methods for tiny networks
OKD improves DG performance without increasing model complexity
Synthesized domain shifts effectively enhance generalization
Abstract
We present a systematic study of domain generalization (DG) for tiny neural networks. This problem is critical to on-device machine learning applications but has been overlooked in the literature where research has been merely focused on large models. Tiny neural networks have much fewer parameters and lower complexity and therefore should not be trained the same way as their large counterparts for DG applications. By conducting extensive experiments, we find that knowledge distillation (KD), a well-known technique for model compression, is much better for tackling the on-device DG problem than conventional DG methods. Another interesting observation is that the teacher-student gap on out-of-distribution data is bigger than that on in-distribution data, which highlights the capacity mismatch issue as well as the shortcoming of KD. We further propose a method called out-of-distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
MethodsKnowledge Distillation
