Binarsity: a penalization for one-hot encoded features in linear supervised learning
Mokhtar Z. Alaya, Simon Bussy, St\'ephane Ga\"iffas, Agathe Guilloux

TL;DR
This paper introduces 'binarsity', a new penalization technique for one-hot encoded features in large-scale linear supervised learning, promoting piecewise constant and block sparse weights with efficient computation.
Contribution
The paper proposes the binarsity penalization, combining total-variation regularization with linear constraints, to improve feature selection and interpretability in one-hot encoded continuous features.
Findings
Achieves good empirical performance on multiple datasets.
Provides non-asymptotic oracle inequalities for generalized linear models.
Computational complexity comparable to standard L1 penalization.
Abstract
This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called \emph{binarsity}. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint. This induces two interesting properties on the model weights of the one-hot encoded features: they are piecewise constant, and are eventually block sparse. Non-asymptotic oracle inequalities for generalized linear models are proposed. Moreover, under a sparse additive model assumption, we prove that our procedure matches the state-of-the-art in this setting. Numerical experiments illustrate the good performances of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Statistical Methods and Inference · Structural Health Monitoring Techniques
