Many Minds from One Model: Bayesian-Inspired Transformers for Population Diversity
Diji Yang, Yi Zhang

TL;DR
This paper introduces Population Bayesian Transformers (B-Trans), a method to generate diverse, coherent transformer model instances from a single pre-trained model by injecting stochasticity into normalization layers, inspired by population diversity.
Contribution
The paper proposes a novel Bayesian-inspired approach to produce diverse transformer model instances from one pre-trained model, enhancing response diversity and task performance.
Findings
B-Trans generates diverse yet coherent model instances.
B-Trans improves response diversity in zero-shot tasks.
B-Trans outperforms deterministic baselines in task performance.
Abstract
Despite their scale and success, modern transformers are usually trained as single-minded systems: optimization produces a deterministic set of parameters, representing a single functional hypothesis about the data. Motivated by the analogy to human populations, in which population-level intelligence emerges from diverse individual behaviors, we propose Population Bayesian Transformers (B-Trans), which enable sampling diverse yet coherent transformer large language model instances (hereafter referred to as a 'mind') from a single pre-trained LLM. B-Trans introduces a Bayesian-inspired posterior proxy by injecting stochasticity directly into normalization layers, avoiding the prohibitive cost of training full Bayesian neural networks. Sampling from this proxy yields a population of minds with diverse behaviors while maintaining general competence. During the generation of each response,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
