DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf, Junchi, Yan

TL;DR
DIFFormer introduces a scalable diffusion-based transformer model that encodes complex inter-instance dependencies through energy-constrained diffusion processes, improving performance across various learning tasks.
Contribution
The paper proposes a novel energy constrained diffusion model that induces a new class of neural encoders called DIFFormer, with scalable and complex-structure learning capabilities.
Findings
Superior performance in node classification on large graphs
Effective semi-supervised image/text classification
Accurate spatial-temporal dynamics prediction
Abstract
Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Neural Networks and Applications
MethodsDiffusion
