Transformer as an Euler Discretization of Score-based Variational Flow

Huadong Liao

arXiv:2604.23740·cs.LG·April 28, 2026

Transformer as an Euler Discretization of Score-based Variational Flow

Huadong Liao

PDF

TL;DR

This paper presents a theoretical foundation for Transformers by modeling them as Euler discretizations of a continuous score-based variational flow, unifying attention and MoE mechanisms.

Contribution

It introduces SVFlow, a continuous-time dynamical system that explains Transformer architecture and training stability through a unified geometric framework.

Findings

01

Euler discretization of SVFlow recovers Transformer architecture

02

Attention approximates the SVFlow vector field using a vMF kernel

03

Experiments show SVFlow metrics correlate with language model performance

Abstract

Despite the Transformer's dominance across machine learning, its architecture remains largely heuristic and lacks a unified theoretical foundation. We introduce Score-based Variational Flow (SVFlow), a continuous-time dynamical system for representation learning in which the state evolves according to a variational posterior-weighted average of conditional log-likelihood scores, and provide a principled basis for regularization through variational consistency. We show that forward Euler discretization of spherical SVFlow exactly recovers the Transformer architecture. Multi-head attention approximates SVFlow vector field via a vMF kernel-smoothed posterior, while MoE/FFN approximates it in a relaxed network-based way, and the residual-normalization block implements a relaxed retraction that maintains spherical geometry. This unification explains why attention trains stably without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.