Robust learning of data anomalies with analytically-solvable entropic   outlier sparsification

Illia Horenko

arXiv:2112.11768·stat.ME·June 8, 2022

Robust learning of data anomalies with analytically-solvable entropic outlier sparsification

Illia Horenko

PDF

Open Access

TL;DR

This paper introduces Entropic Outlier Sparsification (EOS), a robust method with an analytical solution for detecting data anomalies across various learning scenarios, offering computational efficiency and theoretical insights.

Contribution

The paper presents a closed-form solution for EOS, enabling efficient anomaly detection and providing theoretical justification for Gaussian mixture models in data analysis.

Findings

01

EOS outperforms traditional methods on synthetic data

02

Efficient linear-cost computation independent of data dimension

03

Gaussian mixtures are optimal for squared Euclidean distances

Abstract

Entropic Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies in a broad class of learning methods, including the unsupervised problems (like detection of non-Gaussian outliers in mostly-Gaussian data) and in the supervised learning with mislabeled data. EOS dwells on the derived analytic closed-form solution of the (weighted) expected error minimization problem subject to the Shannon entropy regularization. In contrast to common regularization strategies requiring computational costs that scale polynomial with the data dimension, identified closed-form solution is proven to impose additional iteration costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically-symmetric Gaussians - used heuristically in many popular data analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Anomaly Detection Techniques and Applications