Improved Operator Learning by Orthogonal Attention
Zipeng Xiao, Zhongkai Hao, Bokai Lin, Zhijie Deng, Hang Su

TL;DR
This paper introduces an orthogonal attention mechanism for neural operators, leveraging eigendecomposition to improve generalization and reduce overfitting in PDE solution learning, demonstrating superior performance on benchmark datasets.
Contribution
The paper proposes a novel orthogonal attention method based on eigendecomposition, providing a natural regularization to enhance neural operator training.
Findings
Outperforms baseline models on six benchmark datasets
Reduces overfitting through orthogonalization
Improves generalization across regular and irregular geometries
Abstract
Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing…
Peer Reviews
Decision·ICML 2024 Spotlight
1. a novel attention mechanism is proposed to address the issue of overfitting, and the experimental sections show improvement over the existing attention mechanisms. 2. the ablation study compares the proposed orthogonal attention mechanism to other normalization schemes, and shows the advantages of the proposed mechanism. 3. scaling up the neural network with the proposed orthogonal attention mechanism brings in performance improvement, which shows that the proposed mechanism improves genera
1. I understand the motivation that the top line in figure 1 updates the PDE solution so that strong regularization is needed, and that is why the proposed orthogonal attention is incorporated, along with linear attention. However, technically, both the top line and the bottom line are simply nonlinear functions, so have the authors tried to incorporate the proposed attention into both lines to see if it further improves the generalization? 2. the current parametrization requires solving a Chol
- This work proposes a novel orthogonal attention mechanism for neural operators that provides inherent regularization. The connection to eigendecomposition of the kernel operator is an original perspective. The authors introduces a two-pathway architecture with eigenfunction approximation and orthogonal attention-based solution update. The disentangled design is innovative. - Orthogonal regularization through orthogonalization of features is an interesting way to mitigate overfitting in neural
- The motivation of avoiding overfitting with regularization is reasonable, but the paper lacks experiments that directly demonstrate overfitting issues in baseline models to substantiate the need for orthogonal regularization. Adding such empirical analysis could strengthen the motivation. - While the eigendecomposition perspective provides insights, the connection to eigenfunctions is mainly conceptual. More theoretical analysis that formally relates the orthogonal attention to spectral proper
1. The method is easy to understand with some theory backup. 2. The method has shown improved performance across many datasets.
1. It is unclear how to justify that the method mitigates the overfitting problem, which is one of the central claims of the paper. 2. The impact of the method versus the models' size should be studied.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning and ELM · Neural Networks and Applications
