# Interpretable multiclass classification by MDL-based rule lists

**Authors:** Hugo M. Proen\c{c}a, Matthijs van Leeuwen

arXiv: 1905.00328 · 2019-11-01

## TL;DR

This paper introduces a novel MDL-based approach for learning compact, interpretable probabilistic rule lists for multiclass classification, effectively balancing model complexity and accuracy without extensive hyperparameter tuning.

## Contribution

The paper proposes a new formalization using MDL for probabilistic rule lists and introduces the Classy algorithm for efficient, parameter-insensitive model learning.

## Key findings

- Classy produces small, accurate rule lists that outperform state-of-the-art classifiers.
- The method is insensitive to the candidate set parameter.
- Training set compression correlates with classification performance.

## Abstract

Interpretable classifiers have recently witnessed an increase in attention from the data mining community because they are inherently easier to understand and explain than their more complex counterparts. Examples of interpretable classification models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In this paper, we consider the problem of learning compact yet accurate probabilistic rule lists for multiclass classification. Specifically, we propose a novel formalization based on probabilistic rule lists and the minimum description length (MDL) principle. This results in virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, by which overfitting and the need for hyperparameter tuning are effectively avoided. Finally, we introduce the Classy algorithm, which greedily finds rule lists according to the proposed criterion. We empirically demonstrate that Classy selects small probabilistic rule lists that outperform state-of-the-art classifiers when it comes to the combination of predictive performance and interpretability. We show that Classy is insensitive to its only parameter, i.e., the candidate set, and that compression on the training set correlates with classification performance, validating our MDL-based selection criterion.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00328/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00328/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1905.00328/full.md

---
Source: https://tomesphere.com/paper/1905.00328