Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

Zhangzhi Xiong; Haoyi Wu; You Wu; Shuqi Gu; Kan Ren; Kewei Tu

arXiv:2604.26762·cs.LG·April 30, 2026

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

Zhangzhi Xiong, Haoyi Wu, You Wu, Shuqi Gu, Kan Ren, Kewei Tu

PDF

TL;DR

This paper investigates the application of Probabilistic Transformer (PT) and its extension ST-PT to time series modeling, emphasizing their interpretability and programmability as factor graphs for improved time series analysis.

Contribution

It extends PT to ST-PT for time series, explores its programmable properties, and empirically studies how these can enhance prior injection, conditional generation, and Bayesian updates.

Findings

01

ST-PT can incorporate symbolic priors into time series models.

02

External conditions can program factor matrices for conditional generation.

03

MFVI iterations in ST-PT enable principled Bayesian posterior updates.

Abstract

The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the Transformer ceases to be a black-box neural network and becomes a programmable factor graph: graph topology, factor potentials, and the message-passing schedule are all explicit and inspectable primitives that can be engineered. PT was originally developed for natural language and in this report we investigate its potential for time series. We first lift PT into the Spatial-Temporal Probabilistic Transformer (ST-PT) to repair PT's missing channel axis and weak per-step semantics, and adopt ST-PT as a shared cornerstone backbone. We then identify three distinct properties that PT/ST-PT offers as a factor-graph model and derive three Research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.