# SuperMix: Sparse Regularization for Mixtures

**Authors:** Yohann de Castro (ECL, ICJ), S\'ebastien Gadat (TSE), Cl\'ement, Marteau (ICJ), Cathy Maugis (IMT)

arXiv: 1907.10592 · 2020-06-22

## TL;DR

This paper introduces a grid-less convex method called Beurling-LASSO for estimating discrete mixing measures in kernel mixture models, providing theoretical support for support localization and stability, with practical algorithms and performance analysis.

## Contribution

It develops a new regularization-based approach for mixture estimation, deriving bounds and stability results, and proposes efficient algorithms for implementation.

## Key findings

- Supports localization bounds depend on support separation
- Estimator recovers the correct number of mixture components with large enough samples
- Algorithms based on Sliding Frank-Wolfe and Conic Particle Gradient Descent are effective

## Abstract

This paper investigates the statistical estimation of a discrete mixing measure $\mu$0 involved in a kernel mixture model. Using some recent advances in l1-regularization over the space of measures, we introduce a "data fitting and regularization" convex program for estimating $\mu$0 in a grid-less manner from a sample of mixture law, this method is referred to as Beurling-LASSO. Our contribution is twofold: we derive a lower bound on the bandwidth of our data fitting term depending only on the support of $\mu$0 and its so-called "minimum separation" to ensure quantitative support localization error bounds; and under a so-called "non-degenerate source condition" we derive a non-asymptotic support stability property. This latter shows that for a sufficiently large sample size n, our estimator has exactly as many weighted Dirac masses as the target $\mu$0 , converging in amplitude and localization towards the true ones. Finally, we also introduce some tractable algorithms for solving this convex program based on "Sliding Frank-Wolfe" or "Conic Particle Gradient Descent". Statistical performances of this estimator are investigated designing a so-called "dual certificate", which is appropriate to our setting. Some classical situations, as e.g. mixtures of super-smooth distributions (e.g. Gaussian distributions) or ordinary-smooth distributions (e.g. Laplace distributions), are discussed at the end of the paper.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10592/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1907.10592/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1907.10592/full.md

---
Source: https://tomesphere.com/paper/1907.10592