Practical Knowledge Distillation: Using DNNs to Beat DNNs
Chung-Wei Lee, Pavlos Athanasios Apostolopulos, Igor L. Markov

TL;DR
This paper introduces practical techniques for knowledge distillation and data denoising to enhance DNN performance on tabular data, matching or surpassing gradient boosting, with theoretical justification and real-world industrial application.
Contribution
It presents a novel combination of data and model distillation methods, including input-data distillation and optimized ensembling, with theoretical proof of equivalence to classical knowledge distillation.
Findings
DNNs can outperform gradient boosting on small datasets.
Data distillation and ensembling improve DNN accuracy significantly.
The methods are validated on real-world industrial applications.
Abstract
For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Fault Detection and Control Systems · Anomaly Detection Techniques and Applications
