Variational Neurons in Transformers for Language Modeling
Yves Ruffenach

TL;DR
This paper introduces variational neurons into Transformer models for language modeling, enabling internal uncertainty estimation while maintaining strong predictive performance.
Contribution
It presents a novel integration of variational units within Transformers, enhancing internal uncertainty modeling without sacrificing accuracy.
Findings
Variational neurons integrate stably into Transformers.
They preserve strong predictive performance.
They produce informative uncertainty signals.
Abstract
Transformers for language modeling usually rely on deterministic internal computation, with uncertainty expressed mainly at the output layer. We introduce variational neurons into Transformer feed-forward computation so that uncertainty becomes part of the internal computation itself. Concretely, we replace deterministic feed-forward units with local variational units based on EVE while preserving the overall Transformer backbone. We evaluate this design in compact next-token language-modeling settings. We compare deterministic and variational variants with both predictive and probabilistic criteria. Alongside negative log-likelihood, perplexity and accuracy, we analyze calibration, conditional variance, mutual information and latent-usage statistics. The resulting picture is clear. Variational neurons integrate stably into Transformers, preserve strong predictive performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
