Solving Empirical Bayes via Transformers

Anzo Teh; Mark Jabbour; Yury Polyanskiy

arXiv:2502.09844·cs.LG·May 29, 2025

Solving Empirical Bayes via Transformers

Anzo Teh, Mark Jabbour, Yury Polyanskiy

PDF

Open Access

TL;DR

This paper introduces a transformer-based approach to solving the classical empirical Bayes problem for Poisson means, demonstrating both theoretical guarantees and practical advantages over traditional methods.

Contribution

It applies transformers to empirical Bayes, providing theoretical analysis and showing small models outperform classical algorithms in real-world and synthetic data.

Findings

01

Transformers achieve vanishing regret as dimension increases.

02

Small models outperform classical algorithms like NPMLE.

03

Transformers internally work differently from traditional estimators.

Abstract

This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $θ$ (with iid coordinates sampled from an unknown prior $π$ ) is estimated on the basis of $X = Poisson (θ)$ . A transformer model is pre-trained on a set of synthetically generated pairs $(X, θ)$ and learns to do in-context learning (ICL) by adapting to unknown $π$ . Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $π$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training