Variational Neurons in Transformers for Language Modeling

Yves Ruffenach

arXiv:2603.28219·cs.LG·March 31, 2026

Variational Neurons in Transformers for Language Modeling

Yves Ruffenach

PDF

TL;DR

This paper introduces variational neurons into Transformer models for language modeling, enabling internal uncertainty estimation while maintaining strong predictive performance.

Contribution

It presents a novel integration of variational units within Transformers, enhancing internal uncertainty modeling without sacrificing accuracy.

Findings

01

Variational neurons integrate stably into Transformers.

02

They preserve strong predictive performance.

03

They produce informative uncertainty signals.

Abstract

Transformers for language modeling usually rely on deterministic internal computation, with uncertainty expressed mainly at the output layer. We introduce variational neurons into Transformer feed-forward computation so that uncertainty becomes part of the internal computation itself. Concretely, we replace deterministic feed-forward units with local variational units based on EVE while preserving the overall Transformer backbone. We evaluate this design in compact next-token language-modeling settings. We compare deterministic and variational variants with both predictive and probabilistic criteria. Alongside negative log-likelihood, perplexity and accuracy, we analyze calibration, conditional variance, mutual information and latent-usage statistics. The resulting picture is clear. Variational neurons integrate stably into Transformers, preserve strong predictive performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.