Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

Aditya Ravuri; Neil D. Lawrence

arXiv:2507.21040·cs.LG·July 29, 2025

Transformers as Unrolled Inference in Probabilistic Laplacian Eigenmaps: An Interpretation and Potential Improvements

Aditya Ravuri, Neil D. Lawrence

PDF

Open Access

TL;DR

This paper offers a probabilistic interpretation of transformers as unrolled inference in Laplacian Eigenmaps, revealing their initial linear reduction and proposing a simple modification that improves performance in language and vision tasks.

Contribution

It introduces a novel probabilistic perspective on transformers, connecting them to Laplacian Eigenmaps and suggesting a simple yet effective modification for better performance.

Findings

01

Transformers perform initial linear dimensionality reduction.

02

A graph Laplacian term naturally arises within transformer blocks.

03

Subtracting the identity from attention improves validation performance.

Abstract

We propose a probabilistic interpretation of transformers as unrolled inference steps assuming a probabilistic Laplacian Eigenmaps model from the ProbDR framework. Our derivation shows that at initialisation, transformers perform "linear" dimensionality reduction. We also show that within the transformer block, a graph Laplacian term arises from our arguments, rather than an attention matrix (which we interpret as an adjacency matrix). We demonstrate that simply subtracting the identity from the attention matrix (and thereby taking a graph diffusion step) improves validation performance on a language model and a simple vision transformer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications