Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms
Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

TL;DR
This paper investigates Gaussian mean estimation from coarse data, characterizing when the mean is identifiable under convex partitions and providing efficient algorithms for estimation in such cases.
Contribution
It provides a complete characterization of mean identifiability under convex partitions and introduces efficient algorithms for estimation when the mean is identifiable.
Findings
Identifiability of the mean depends on the structure of the convex partition.
Efficient algorithms are developed for mean estimation under identifiable conditions.
The work resolves open questions about the conditions for identifiability and computational feasibility.
Abstract
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample is drawn from a -dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing . When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without…
Peer Reviews
Decision·ICLR 2026 Poster
The coarse-data model is well motivated, and captures realistic phenomena like rounding, sensor quantization, and economic market friction. The paper makes theoretical contributions by resolving two questions definitively: the identifiability characterization is clean and geometrically intuitive; while the estimator is both sample and computationally efficient, with time complexity polynomial in dimension $d$ and inverse desired accuracy $1/\epsilon$, and the bit/facet complexity of the descri
The estimator's sample complexity scales as $\widetilde{O}\left( \frac{d}{ \alpha^4 \epsilon^2} + \frac{dD^2}{\alpha^4} \right)$, where $D$ is known bound on $\|| \mu \||$. This is strictly worse than Fotakis et al.’s sample complexity of $\widetilde{O}\left(\frac{d }{\alpha^2 \epsilon^2}\right)$, which is independent of $D$. Also, I think claiming the sample complexity in the abstract as $\widetilde{O}\left(\frac{d }{\epsilon^2}\right)$ seems a little misleading, since this holds only in the co
The paper provides novel, complete solutions to the fundamental question in Gaussian mean estimation from coarse data. The proposed algorithm has optimal sample complexity. - It establishes that a convex partition is non-identifiable if and only if almost every set is a slab in some common direction. - It shows that identifiability is a property of the partition structure itself, as it does not depend on the true mean $\mu^{*}$. - The algorithm achieves the optimal sample complexity of $\tilde{
- The polynomial time complexity relies on the existence of an efficient sampling oracle for a truncated Gaussian over the observed convex set $P$ (or $P \cap B_{\infty}(0, R)$). While this is achievable in $poly(d)$ time for polytopes using MCMC (Hit-and-Run) methods, these methods typically have high polynomial dependency and are usually slow in practice. - The paper specifies the requirement for an efficient sampling oracle (Assumption 1) that outputs an unbiased sample $y \sim \mathcal{N}(\m
See main review
See main review
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference
