Transformers Simulate MLE for Sequence Generation in Bayesian Networks

Yuan Cao; Yihan He; Dennis Wu; Hong-Yu Chen; Jianqing Fan; Han Liu

arXiv:2501.02547·stat.ML·July 9, 2025

Transformers Simulate MLE for Sequence Generation in Bayesian Networks

Yuan Cao, Yihan He, Dennis Wu, Hong-Yu Chen, Jianqing Fan, Han Liu

PDF

Open Access

TL;DR

This paper explores the theoretical and practical capabilities of transformers to generate sequences based on Bayesian networks using in-context maximum likelihood estimation, showing they can learn complex probabilistic models.

Contribution

It demonstrates that transformers can theoretically and practically estimate Bayesian network probabilities and generate sequences, enhancing understanding of their probabilistic modeling abilities.

Findings

01

Transformers can estimate Bayesian network conditional probabilities.

02

Transformers can autoregressively generate sequences based on estimated probabilities.

03

Effective training of such transformers is achievable in practice.

Abstract

Transformers have achieved significant success in various fields, notably excelling in tasks involving sequential data like natural language processing. Despite these achievements, the theoretical understanding of transformers' capabilities remains limited. In this paper, we investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based on in-context maximum likelihood estimation (MLE). Specifically, we consider a setting where a context is formed by a set of independent sequences generated according to a Bayesian network. We demonstrate that there exists a simple transformer model that can (i) estimate the conditional probabilities of the Bayesian network according to the context, and (ii) autoregressively generate a new sample according to the Bayesian network with estimated conditional probabilities. We further demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training