Modeling with Categorical Features via Exact Fusion and Sparsity Regularisation
Kayhan Behdin, Riade Benbaki, Peter Radchenko, Rahul Mazumder

TL;DR
This paper introduces a novel high-dimensional linear regression method for categorical features that combines exact model compression with sparsity regularization, using mixed integer programming and efficient algorithms.
Contribution
It develops a new estimation approach with exact formulations and algorithms for clustering and sparsity, along with theoretical guarantees and improved performance.
Findings
Outperforms state-of-the-art methods on synthetic and real datasets.
Provides theoretical guarantees for prediction and cluster recovery.
Develops a fast approximate algorithm with high-quality solutions.
Abstract
We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a) clustering of the regression coefficients to collapse some of the categorical levels together; and (b) sparsity of the regression coefficients. We present novel mixed integer programming formulations for our estimator, and develop a custom row generation procedure to speed up the exact off-the-shelf solvers. We also propose a fast approximate algorithm for our method that obtains high-quality feasible solutions via block coordinate descent. As the main building block of our algorithm, we develop an exact algorithm for the univariate case based on dynamic programming, which can be of independent interest. We establish new theoretical guarantees for both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
