When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

Brandon Yee; Pairie Koh; Jack Rodriguez; Mihir Tekal

arXiv:2605.08318·cs.LG·May 12, 2026

When Attention Beats Fourier: Multi-Scale Transformers for PDE Solving on Irregular Domains

Brandon Yee, Pairie Koh, Jack Rodriguez, Mihir Tekal

PDF

TL;DR

This paper introduces the Multi-Scale Attention Transformer ( extbackslash msat{}) for solving PDEs, demonstrating superior accuracy and efficiency over existing methods on complex geometries and providing insights into architecture selection and regularization effects.

Contribution

The paper presents extbackslash msat{}, a novel transformer-based architecture for PDE solving, with comprehensive empirical evaluation and theoretical analysis guiding architecture choice and regularization.

Findings

01

extbackslash msat{} achieves state-of-the-art accuracy on complex geometry PDE problems.

02

extbackslash msat{} significantly reduces inference time compared to Mamba-NO.

03

Physics regularization improves performance on diffusion problems but harms chaotic regimes.

Abstract

We study the problem of \emph{architecture selection} for deep learning models trained to solve partial differential equations (PDEs), asking when transformer-based architectures with learned attention outperform Fourier-domain neural operators. We introduce the \textbf{Multi-Scale Attention Transformer} (\msat{}), a deep learning architecture that encodes spatiotemporal solution histories as token sequences and trains end-to-end via a composite supervised objective with optional physics-informed regularization terms. We conduct a comprehensive empirical evaluation against nine baselines -- including physics-informed neural networks (PINNs), neural operators (FNO, DeepONet, GNOT), and state-space models (Mamba-NO) -- across five benchmark problems from the PINNacle suite, using identical train/test splits and reference data for all methods. \msat{} achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.