Knowledge Distillation for Efficient Sequences of Training Runs

Xingyu Liu; Alex Leonardi; Lu Yu; Chris Gilmer-Hill; Matthew Leavitt,; Jonathan Frankle

arXiv:2303.06480·cs.LG·March 14, 2023·1 cites

Knowledge Distillation for Efficient Sequences of Training Runs

Xingyu Liu, Alex Leonardi, Lu Yu, Chris Gilmer-Hill, Matthew Leavitt,, Jonathan Frankle

PDF

Open Access

TL;DR

This paper demonstrates that applying knowledge distillation across sequential training runs significantly reduces training time and costs, especially when combined with strategies to minimize overhead, making it practical for real-world scenarios.

Contribution

It introduces methods to leverage knowledge distillation for sequential training, reducing overhead and overall training costs with minimal accuracy impact.

Findings

01

KD reduces training time for subsequent runs

02

Proposed strategies cut KD overhead by 80-90%

03

Significant cost savings in practical training workflows

Abstract

In many practical scenarios -- like hyperparameter search or continual retraining with new data -- related training runs are performed many times in sequence. Current practice is to train each of these models independently from scratch. We study the problem of exploiting the computation invested in previous runs to reduce the cost of future runs using knowledge distillation (KD). We find that augmenting future runs with KD from previous runs dramatically reduces the time necessary to train these models, even taking into account the overhead of KD. We improve on these results with two strategies that reduce the overhead of KD by 80-90% with minimal effect on accuracy and vast pareto-improvements in overall cost. We conclude that KD is a promising avenue for reducing the cost of the expensive preparatory work that precedes training final models in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Reservoir Engineering and Simulation Methods

MethodsKnowledge Distillation