Sparse Classification: a scalable discrete optimization perspective

Dimitris Bertsimas; Jean Pauphilet; Bart Van Parys

arXiv:1710.01352·math.OC·January 8, 2025·Mach. Learn.

Sparse Classification: a scalable discrete optimization perspective

Dimitris Bertsimas, Jean Pauphilet, Bart Van Parys

PDF

1 Repo

TL;DR

This paper introduces a binary convex optimization approach with a cutting-plane algorithm for sparse classification, achieving exact solutions efficiently and outperforming Lasso in support recovery and sparsity on synthetic and real data.

Contribution

It formulates sparse classification as a binary convex problem and proposes an exact, scalable algorithm with theoretical support for support recovery.

Findings

01

Algorithm finds optimal solutions for large datasets within minutes.

02

Achieves perfect support recovery in synthetic data for large sample sizes.

03

Returns sparser classifiers with similar accuracy compared to Lasso on real data.

Abstract

We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for $n$ and $p$ in the $10, 000$ s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists a $n_{0}$ such that the algorithm takes a long time to find the optimal solution and does not recover the correct support for $n < n_{0}$ , while for $n ⩾ n_{0}$ , the algorithm quickly detects all the true features, and does not return any false features. In contrast, while Lasso accurately detects all the true features, it persistently returns incorrect features, even as the number of observations increases. Consequently, on numerous real-world experiments, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeanpauphilet/SubsetSelectionCIO.jl
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.