High-Dimensional Robust Mean Estimation with Untrusted Batches

Maryam Aliakbarpour; Vladimir Braverman; Yuhan Liu; Junze Yin

arXiv:2602.20698·cs.LG·February 25, 2026

High-Dimensional Robust Mean Estimation with Untrusted Batches

Maryam Aliakbarpour, Vladimir Braverman, Yuhan Liu, Junze Yin

PDF

Open Access

TL;DR

This paper develops algorithms for high-dimensional mean estimation in a setting with untrusted data batches, addressing both adversarial users and distributional heterogeneity, and achieves minimax-optimal error bounds.

Contribution

It introduces two Sum-of-Squares based algorithms that handle tiered corruption in high-dimensional, multi-user environments with provably optimal error rates.

Findings

01

Achieves minimax-optimal error rate of O(√(ε/n) + √(d/nN) + √α)

02

Demonstrates suppression of adversarial influence by batch averaging

03

Addresses new challenges in high-dimensional, sample-level corruption scenarios

Abstract

We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$ . In this environment, a learner seeks to recover the mean $μ$ of a true distribution $P$ from a collection of sources that are both statistically heterogeneous and potentially malicious. We formalize this challenge through a double corruption landscape: an $ε$ -fraction of users are entirely adversarial, while the remaining ``good'' users provide data from distributions that are related to $P$ , but deviate by a proximity parameter $α$ . Unlike existing work on the untrusted batch model, which typically measures this deviation via total variation distance in discrete settings, we address the continuous, high-dimensional regime under two natural variants for deviation: (1) good batches are drawn from distributions with a mean-shift of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques