High-Dimensional Robust Mean Estimation with Untrusted Batches
Maryam Aliakbarpour, Vladimir Braverman, Yuhan Liu, Junze Yin

TL;DR
This paper develops algorithms for high-dimensional mean estimation in a setting with untrusted data batches, addressing both adversarial users and distributional heterogeneity, and achieves minimax-optimal error bounds.
Contribution
It introduces two Sum-of-Squares based algorithms that handle tiered corruption in high-dimensional, multi-user environments with provably optimal error rates.
Findings
Achieves minimax-optimal error rate of O(√(ε/n) + √(d/nN) + √α)
Demonstrates suppression of adversarial influence by batch averaging
Addresses new challenges in high-dimensional, sample-level corruption scenarios
Abstract
We study high-dimensional mean estimation in a collaborative setting where data is contributed by users in batches of size . In this environment, a learner seeks to recover the mean of a true distribution from a collection of sources that are both statistically heterogeneous and potentially malicious. We formalize this challenge through a double corruption landscape: an -fraction of users are entirely adversarial, while the remaining ``good'' users provide data from distributions that are related to , but deviate by a proximity parameter . Unlike existing work on the untrusted batch model, which typically measures this deviation via total variation distance in discrete settings, we address the continuous, high-dimensional regime under two natural variants for deviation: (1) good batches are drawn from distributions with a mean-shift of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
