Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration

Manh Cuong Dao; Quang Hung Pham; Phi Le Nguyen; Thao Nguyen Truong; Bryan Kian Hsiang Low; Trong Nghia Hoang

arXiv:2602.08920·cs.LG·May 14, 2026

Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration

Manh Cuong Dao, Quang Hung Pham, Phi Le Nguyen, Thao Nguyen Truong, Bryan Kian Hsiang Low, Trong Nghia Hoang

PDF

TL;DR

This paper introduces a diffusion-inspired reconfiguration of transformers that models feature transformations as probabilistic mappings, enabling principled uncertainty propagation and improved calibration in vision and language tasks.

Contribution

It proposes a novel diffusion-inspired probabilistic framework for transformers that enhances uncertainty calibration without sacrificing predictive performance.

Findings

01

Achieves superior uncertainty calibration on vision and language benchmarks.

02

Maintains original predictive accuracy while propagating uncertainty.

03

Models feature transformations as probabilistic mappings resembling diffusion processes.

Abstract

Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a principled mechanism for uncertainty propagation through their feature transformation stack. In this work, we propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping. Composing these probabilistic mappings reveals a probability path that mimics the structure of a diffusion process, transporting data mass from the input distribution to the pre-trained feature distribution. This probability path can then be recompiled on a diffusion process with a unified transition model to enable principled propagation of representation uncertainty throughout the pre-trained model's architecture while maintaining its original predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.