Robust High-Dimensional Mean Estimation With Low Data Size, an Empirical Study
Cullen Anderson, Jeff M. Phillips

TL;DR
This paper empirically investigates high-dimensional mean estimation methods under low data size conditions, highlighting practical challenges and performance in real-world scenarios with corrupted data.
Contribution
It provides an extensive experimental comparison of robust mean estimation techniques in high-dimensional, low-data regimes, an area with limited practical evaluation.
Findings
Certain algorithms perform better with limited data
Trade-offs between robustness and data efficiency are observed
Empirical results challenge some theoretical assumptions
Abstract
Robust statistics aims to compute quantities to represent data where a fraction of it may be arbitrarily corrupted. The most essential statistic is the mean, and in recent years, there has been a flurry of theoretical advancement for efficiently estimating the mean in high dimensions on corrupted data. While several algorithms have been proposed that achieve near-optimal error, they all rely on large data size requirements as a function of dimension. In this paper, we perform an extensive experimentation over various mean estimation techniques where data size might not meet this requirement due to the high-dimensional setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Advanced Statistical Process Monitoring
