Statistical mechanics of sparse generalization and model selection

Alejandro Lage-Castellanos; Andrea Pagnani; Martin Weigt

arXiv:0907.3241·cond-mat.dis-nn·February 9, 2012

Statistical mechanics of sparse generalization and model selection

Alejandro Lage-Castellanos, Andrea Pagnani, Martin Weigt

PDF

TL;DR

This paper uses statistical mechanics to analyze sparse model selection in high-dimensional inference, comparing different dilution methods and highlighting the near-optimal performance of $L_0$ dilution in certain regimes.

Contribution

It provides a theoretical analysis of sparse generalization using replica methods, comparing naive, $L_1$, and $L_0$ dilutions, and identifies conditions where $L_0$ is nearly optimal.

Findings

01

$L_p$ dilutions outperform naive methods

02

$L_0$ dilution nearly perfect in some regimes

03

$L_0$ outperforms $L_1$ in specific conditions

Abstract

One of the crucial tasks in many inference problems is the extraction of sparse information out of a given number of high-dimensional measurements. In machine learning, this is frequently achieved using, as a penality term, the $L_{p}$ norm of the model parameters, with $p \leq 1$ for efficient dilution. Here we propose a statistical-mechanics analysis of the problem in the setting of perceptron memorization and generalization. Using a replica approach, we are able to evaluate the relative performance of naive dilution (obtained by learning without dilution, following by applying a threshold to the model parameters), $L_{1}$ dilution (which is frequently used in convex optimization) and $L_{0}$ dilution (which is optimal but computationally hard to implement). Whereas both $L_{p}$ diluted approaches clearly outperform the naive approach, we find a small region where $L_{0}$ works almost perfectly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.