Robust learning of data anomalies with analytically-solvable entropic outlier sparsification
Illia Horenko

TL;DR
This paper introduces Entropic Outlier Sparsification (EOS), a robust method with an analytical solution for detecting data anomalies across various learning scenarios, offering computational efficiency and theoretical insights.
Contribution
The paper presents a closed-form solution for EOS, enabling efficient anomaly detection and providing theoretical justification for Gaussian mixture models in data analysis.
Findings
EOS outperforms traditional methods on synthetic data
Efficient linear-cost computation independent of data dimension
Gaussian mixtures are optimal for squared Euclidean distances
Abstract
Entropic Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies in a broad class of learning methods, including the unsupervised problems (like detection of non-Gaussian outliers in mostly-Gaussian data) and in the supervised learning with mislabeled data. EOS dwells on the derived analytic closed-form solution of the (weighted) expected error minimization problem subject to the Shannon entropy regularization. In contrast to common regularization strategies requiring computational costs that scale polynomial with the data dimension, identified closed-form solution is proven to impose additional iteration costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically-symmetric Gaussians - used heuristically in many popular data analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications
