Distilling Interpretable Models into Human-Readable Code
Walker Ravina, Ethan Sterling, Olexiy Oryeshko, Nathan Bell, Honglei, Zhuang, Xuanhui Wang, Yonghui Wu, Alexander Grushetsky

TL;DR
This paper introduces a method to convert complex models into simple, human-readable code by distilling their knowledge into piecewise-linear functions, enhancing interpretability and manual editability.
Contribution
It proposes a novel distillation approach that approximates models with concise, interpretable piecewise-linear functions, enabling easier review and manual modification.
Findings
Effective across classification, regression, and ranking tasks.
Produces accurate, concise, and human-readable models.
Demonstrates broad applicability with four datasets.
Abstract
The goal of model distillation is to faithfully transfer teacher model knowledge to a model which is faster, more generalizable, more interpretable, or possesses other desirable characteristics. Human-readability is an important and desirable standard for machine-learned model interpretability. Readable models are transparent and can be reviewed, manipulated, and deployed like traditional source code. As a result, such models can be improved outside the context of machine learning and manually edited if desired. Given that directly training such models is difficult, we propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. The proposed distillation methodology approximates a model's univariate numerical functions with piecewise-linear curves in a localized manner. The resulting curve model representations are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
