Feature quantization for parsimonious and interpretable predictive models
Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe, Heinrich

TL;DR
This paper introduces glmdisc, a novel method that integrates feature quantization directly into predictive modeling using a two-step optimization with neural networks, enhancing accuracy and interpretability for logistic regression.
Contribution
It proposes a new two-step optimization strategy embedding quantization into predictive models, improving accuracy and interpretability over traditional preprocessing methods.
Findings
Demonstrates improved prediction accuracy on real and simulated data.
Shows better interpretability with integrated quantization.
Outperforms traditional preprocessing approaches.
Abstract
For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Machine Learning and Data Classification
