Transformers Simulate MLE for Sequence Generation in Bayesian Networks
Yuan Cao, Yihan He, Dennis Wu, Hong-Yu Chen, Jianqing Fan, Han Liu

TL;DR
This paper explores the theoretical and practical capabilities of transformers to generate sequences based on Bayesian networks using in-context maximum likelihood estimation, showing they can learn complex probabilistic models.
Contribution
It demonstrates that transformers can theoretically and practically estimate Bayesian network probabilities and generate sequences, enhancing understanding of their probabilistic modeling abilities.
Findings
Transformers can estimate Bayesian network conditional probabilities.
Transformers can autoregressively generate sequences based on estimated probabilities.
Effective training of such transformers is achievable in practice.
Abstract
Transformers have achieved significant success in various fields, notably excelling in tasks involving sequential data like natural language processing. Despite these achievements, the theoretical understanding of transformers' capabilities remains limited. In this paper, we investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based on in-context maximum likelihood estimation (MLE). Specifically, we consider a setting where a context is formed by a set of independent sequences generated according to a Bayesian network. We demonstrate that there exists a simple transformer model that can (i) estimate the conditional probabilities of the Bayesian network according to the context, and (ii) autoregressively generate a new sample according to the Bayesian network with estimated conditional probabilities. We further demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSparse Evolutionary Training
