Good Allocations from Bad Estimates

S\'ilvia Casacuberta; Moritz Hardt

arXiv:2601.05597·cs.LG·January 12, 2026

Good Allocations from Bad Estimates

S\'ilvia Casacuberta, Moritz Hardt

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that near-optimal treatment allocations can be achieved with significantly fewer samples than traditional methods by using coarse estimates, especially under flexible budgets, challenging the necessity of precise treatment effect estimation.

Contribution

The authors introduce a method that achieves near-optimal treatment allocations with fewer samples by relying on coarse estimates, contrasting with the standard requirement of precise effect estimation.

Findings

01

Achieves the same total treatment effect with O(M/ε) samples instead of O(M/ε^2)

02

Coarse estimates are sufficient for near-optimal allocations

03

Budget flexibility further reduces sample complexity

Abstract

Conditional average treatment effect (CATE) estimation is the de facto gold standard for targeting a treatment to a heterogeneous population. The method estimates treatment effects up to an error $ϵ > 0$ in each of $M$ different strata of the population, targeting individuals in decreasing order of estimated treatment effect until the budget runs out. In general, this method requires $O (M / ϵ^{2})$ samples. This is best possible if the goal is to estimate all treatment effects up to an $ϵ$ error. In this work, we show how to achieve the same total treatment effect as CATE with only $O (M / ϵ)$ samples for natural distributions of treatment effects. The key insight is that coarse estimates suffice for near-optimal treatment allocations. In addition, we show that budget flexibility can further reduce the sample complexity of allocation. Finally, we evaluate our…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The paper makes a clear and elegant theoretical distinction between estimation and allocation. The reduction of the sample complexity from $M/\epsilon^2$ to $M/\epsilon$ is insightful and exciting. 2. Practical relevance: Direct implications for RCT and policy design: significant reduction in sample cost. 3. The proofs are clean and well-structured and the theoretical results are rigor.

Weaknesses

In general, I enjoy reading the paper a lot. I do not have major concerns. 1. Comparison to bandit best arm identification could be expanded. The link is conceptually strong. Particularly, recently, there are some works on good arm identification. Some ideas are very similar, although they are not is a causal inference setting. 2. Policy implication is strong (“RCTs underpowered for CATE estimation can still yield good allocations”), but guidance on how to detect $\rho$-regularity or compute s

Reviewer 02Rating 4Confidence 3

Strengths

- The authors provide interesting insights about sample size requirements for optimal treatment allocation. - The authors substantiate their claims with extensive theoretical analysis

Weaknesses

Some other related work exists that uses similar insights about the problem of optimally allocating treatment, though often without an extensive theoretical analysis. For example, some work has argued that when trying to find the optimal treatment allocation, accurate CATE estimation is not always the most effective [1,2]. While the authors provide a very extensive theoretical analysis, they only briefly explain the potential impact of their contributions. For example, I find it hard to unders

Reviewer 03Rating 8Confidence 4

Strengths

1. **Work tackles an important problem** - The authors tackle the important problem of $K$ selection from $M$ groups in a causal setting. Such a problem can be seen across a variety of real-word situations, and is especially prevalent in the world of policy. This can help more efficiently allocate resources and avoid unnecessary experiments. 2. **Analysis is intuitive and clean** - The authors present a clean and intuitive reason why their proposed selector should outperform baselines. By avoid

Weaknesses

1. **Experiments are not Extensive** - The experiments in Section 6 are condensed to a half a page (with some extra material in the Appendix). As a result, it's hard to understand some of the results. For example, why is the failure percentage not monotonic in $\epsilon$; presumably with increasing $\epsilon$, it is less a stringent failure threshold, so it is surprising that this pattern is exhibited across datasets. Additionally, there is little comparison with the $\frac{1}{\epsilon^2}$ metho

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Inference · Statistical Methods and Bayesian Inference