Learning Physical Simulation with Message Passing Transformer
Zeyi Xu, Yifei Li

TL;DR
This paper introduces a novel Message Passing Transformer architecture for physical simulation, combining graph neural networks, a specialized attention mechanism, and a Fourier-based loss to improve long-term accuracy in dynamical systems.
Contribution
The paper presents a universal GNN-based architecture with a new attention mechanism and Fourier loss, enhancing simulation accuracy and efficiency over existing methods.
Findings
Achieves significant accuracy improvements in long-term dynamical system simulations.
Introduces Hadamard-Product Attention for fine-grained feature focus.
Employs Graph Fourier Loss for balanced energy component optimization.
Abstract
Machine learning methods for physical simulation have achieved significant success in recent years. We propose a new universal architecture based on Graph Neural Network, the Message Passing Transformer, which incorporates a Message Passing framework, employs an Encoder-Processor-Decoder structure, and applies Graph Fourier Loss as loss function for model optimization. To take advantage of the past message passing state information, we propose Hadamard-Product Attention to update the node attribute in the Processor, Hadamard-Product Attention is a variant of Dot-Product Attention that focuses on more fine-grained semantics and emphasizes on assigning attention weights over each feature dimension rather than each position in the sequence relative to others. We further introduce Graph Fourier Loss (GFL) to balance high-energy and low-energy components. To improve time performance, we…
Peer Reviews
Decision·Submitted to ICLR 2025
S1) Novel Graph Fourier Loss which helps learn complex physical phenomena effectively across the energy spectrum of the system. This helps avoid using the Graph Fourier Transform in both training and inference S2) Modified Attention mechanism focusing on obtaining importance of features as scores by using softmax along the dimension and not along the sequence. This is one of the major contributors to the results as the graph structure helps with message passing/interaction and hence softmax can
W1) Comparison between the current method and previous methods in terms of the wall clock time and memory footprint have not been included (The authors have mentioned this as a limitation and also in section F of the appendix). A thorough quantitative analysis of time and memory for each component would be useful for the current and future research as well. W2) Although the GFL seems very effective, how one arrives at the formulation is not very clear. How does one arrive at the expression of $
Where the paper shines is the theoretical background of its approach, the entire model architecture, as well as formulation intricacies are described in detail, and allow for an in-depth look at the architecture, the notable differences to preceding work like the Hadamard-Product attention, and the graph Fourier loss. As far as the reviewer can tell, the architecture should be fully reproducible from this exhibition.
Where this paper falls short in its current form is the evaluation, and its embedding into present literature. Maybe slightly too focussed on PINN literature, it misses two landmark works of the past year: * The Universal Physics Transformers of Alkin et al., which also have a GNN-core and hence fall squarely into the category of a GNN-Transformer hybrid like the presented architecture * Poseidon: Efficient Foundation Models for PDEs by Herde et al., which is also built to spatio-temporally ev
- The proposed Hadamard-Product Attention offers a fine-grained approach to attention by assigning weights to each feature dimension. The experiments show that it brings an improvement over traditional Dot-Product Attention. - The application of Graph Fourier Loss to balance spectral components is novel, leveraging graph signal processing to enhance model accuracy over extended rollouts in physical simulations.
- The computational requirements of the proposed method are significantly greater than all the baselines. Moreover, the computational cost for precomputing Laplacian eigenvectors is not discussed. - From my understanding, the precomputation of Laplacian eigenvectors is not feasible for dynamic graphs that undergo frequent topological changes, such as some of the datasets regarding dynamic flags proposed in the MGN paper. - The ablation study in Table 2 is only conducted on the CylinderFlow datas
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Graph Neural Network · Residual Connection · Multi-Head Attention
