Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs
Chun-Wun Cheng, Jiahao Huang, Yi Zhang, Guang Yang, Carola-Bibiane Sch\"onlieb, Angelica I. Aviles-Rivero

TL;DR
The paper introduces Mamba Neural Operator (MNO), a new framework that unifies state-space models and neural operators to improve PDE solving by capturing long-range dependencies and continuous dynamics more effectively than Transformers.
Contribution
MNO provides a formal connection between structured state-space models and neural operators, enhancing PDE solution accuracy and efficiency beyond traditional Transformer approaches.
Findings
MNO outperforms Transformers in capturing long-range dependencies.
MNO significantly improves the accuracy of neural operators for PDEs.
The framework unifies diverse architectures under a common structure.
Abstract
Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The topic explored in this paper is interesting and engaging. 2. The paper offers some theoretical insights that enhance our understanding of operator learning. 3. The experiments conducted are thorough.
1. The paper feels somewhat rushed, which has resulted in several typos that hinder understanding of the content. Here are some specific issues I've noticed: a. In Section 3.1, there is a mistake in one of the formulas ($n=1^N$), which could confuse readers trying to grasp the main concepts. b. In Section 3.2, it’s unclear how $B$, which is an n-dimensional complex vector, and $u(t)$, which is an L-dimensional real vector, are multiplied in Equation 3. This lack of clarity raises questi
1) Theoretical Connection between Mamba and Neural Operator. 2) MNO addresses long-range dependencies and continuous dynamics better than a transformer.
1) The presentation could be improved by addressing notational inconsistencies. 2) Lack of Novelty as compared to Vision Mamba. 3) Weak baselines. SOTA transformers such as Transolver, Latent Neural Operator, DPOT, LSM, etc. 4) Missing model parameters, training time, inference time comparison and MNO Hyperparamters.
Mathematical derivations are complete, coherent, and consistent. Combines the mathematical frameworks of Mamba (Gu, 2023) and Neural Operator (Kovachki, 2023).
(1) This paper sets the Transformer as a target for comparison in studying neural operators, disregarding all other neural operators. However, the Transformer is neither mainstream nor representative in the neural operator field (see Questions part for details). This setting limits the value of the conclusions drawn in the paper. (2) The paper uses RMSE as a metric for evaluating PDE solutions but fails to address critical properties of PDE solving. This undermines the persuasiveness of the pro
1. The summary of related work is detailed. 2. The paper establishes a theoretical connection between the Mamba architecture and neural operators. 3. Although not comprehensive, the paper includes experiments on several datasets and compares the proposed method with some baseline models.
1. **Presentation Issues:** The paper lacks line numbers, which are essential for precise referencing during reviews. Additionally, the appendix is included in the supplementary materials rather than in the main document, which hampers the ease of accessing critical information. 2. **Writing Clarity:** The manuscript’s writing requires enhancement, as some sentences lack logical coherence. For instance, the statement: *“While Transformers dominate in areas like foundational models and computer v
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
