Feature quantization for parsimonious and interpretable predictive   models

Adrien Ehrhardt; Christophe Biernacki; Vincent Vandewalle; Philippe; Heinrich

arXiv:1903.08920·stat.ME·March 22, 2019·1 cites

Feature quantization for parsimonious and interpretable predictive models

Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe, Heinrich

PDF

Open Access 1 Repo

TL;DR

This paper introduces glmdisc, a novel method that integrates feature quantization directly into predictive modeling using a two-step optimization with neural networks, enhancing accuracy and interpretability for logistic regression.

Contribution

It proposes a new two-step optimization strategy embedding quantization into predictive models, improving accuracy and interpretability over traditional preprocessing methods.

Findings

01

Demonstrates improved prediction accuracy on real and simulated data.

02

Shows better interpretability with integrated quantization.

03

Outperforms traditional preprocessing approaches.

Abstract

For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adimajo/glmdisc_python
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Machine Learning and Data Classification