Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
Hengyu Fu, Zehao Dou, Jiawei Guo, Mengdi Wang, Minshuo Chen

TL;DR
This paper provides a theoretical foundation for diffusion transformers in modeling spatial-temporal dependencies in sequential data, demonstrating their effectiveness in learning Gaussian process data through approximation guarantees and numerical validation.
Contribution
It introduces the first theoretical analysis of diffusion transformers for capturing spatial-temporal dependencies, including score approximation and distribution estimation guarantees.
Findings
Diffusion transformers effectively capture spatial-temporal dependencies.
Theoretical guarantees for learning Gaussian process data are established.
Numerical experiments confirm the theoretical predictions.
Abstract
Diffusion Transformer, the backbone of Sora for video generation, successfully scales the capacity of diffusion models, pioneering new avenues for high-fidelity sequential data generation. Unlike static data such as images, sequential data consists of consecutive data frames indexed by time, exhibiting rich spatial and temporal dependencies. These dependencies represent the underlying dynamic model and are critical to validate the generated data. In this paper, we make the first theoretical step towards bridging diffusion transformers for capturing spatial-temporal dependencies. Specifically, we establish score approximation and distribution estimation guarantees of diffusion transformers for learning Gaussian process data with covariance functions of various decay patterns. We highlight how the spatial-temporal dependencies are captured and affect learning efficiency. Our study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Air Quality Monitoring and Forecasting
MethodsByte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Diffusion · Softmax · Gaussian Process · Attention Is All You Need · Position-Wise Feed-Forward Layer · Absolute Position Encodings
