Sparse Oracle Inequalities for Variable Selection via Regularized Quantization
Cl\'ement Levrard (LPMA)

TL;DR
This paper develops oracle inequalities for a combined quantization and variable selection method using weighted Lasso k-means, providing theoretical guarantees for sparsity adaptation and support recovery in clustering models.
Contribution
It introduces a novel weighted Lasso k-means procedure with theoretical guarantees for sparsity adaptation and support recovery, extending previous work to general weights and non-sparse optimal codebooks.
Findings
Procedure adapts to the sparsity of optimal codebooks.
Supports asymptotic recovery of sparse codebook support.
Effective in Gaussian mixture models with sparse means.
Abstract
We give oracle inequalities on procedures which combines quantization and variable selection via a weighted Lasso -means type algorithm. The results are derived for a general family of weights, which can be tuned to size the influence of the variables in different ways. Moreover, these theoretical guarantees are proved to adapt the corresponding sparsity of the optimal codebooks, if appropriate. Even if there is no sparsity assumption on the optimal codebooks, our procedure is proved to be close to a sparse approximation of the optimal codebooks, as has been done for the Generalized Linear Models in regression. If the optimal codebooks have a sparse support, we also show that this support can be asymptotically recovered, giving an asymptotic upper bound on the probability of misclassification. These results are illustrated with Gaussian mixture models in arbitrary dimension with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
