From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural Operators
Lei Liu, Zhongyi Yu, Hong Wang, Huanshuo Dong, Haiyang Xin, Hongwei Zhao, Bin Li

TL;DR
This paper introduces Skip-Block Routing (SBR), a flexible framework for Transformer-based neural operators that adaptively allocates computational resources to complex regions in PDE solutions, significantly reducing costs and increasing efficiency.
Contribution
The paper proposes SBR, a novel routing mechanism for neural operators that dynamically adjusts processing based on token complexity, improving efficiency without accuracy loss.
Findings
Reduces computational cost by ~50% FLOPs
Doubles inference speed while maintaining accuracy
Seamlessly integrates into various neural operators
Abstract
In recent years, Neural Operators(NO) have gradually emerged as a popular approach for solving Partial Differential Equations (PDEs). However, their application to large-scale engineering tasks suffers from significant computational overhead. And the fact that current models impose a uniform computational cost while physical fields exhibit vastly different complexities constitutes a fundamental mismatch, which is the root of this inefficiency. For instance, in turbulence flows, intricate vortex regions require deeper network processing compared to stable flows. To address this, we introduce a framework: Skip-Block Routing (SBR), a general framework designed for Transformer-based neural operators, capable of being integrated into their multi-layer architectures. First, SBR uses a routing mechanism to learn the complexity and ranking of tokens, which is then applied during inference.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper identifies the uniform computation bottleneck in existing Neural Operators, which is an important problem. 2. The authors' motivation to link the non-uniform complexity of physical fields to the model's computational resource allocation is reasonable. 3. The SBR framework is designed as a general module, and the paper demonstrates its integration across multiple Transformer-based operators. 4. The paper demonstrates, through experiments, the potential of SBR in reducing FLOPs.
1. The "static routing" design, which determines a fixed importance ranking based only on the initial state, is, in principle, incapable of capturing newly emerging complex regions in dynamic evolution problems. Concurrently, the "hard skip" mechanism copies skipped token features, thereby obstructing the global propagation of physical information (e.g., pressure, heat) through these "simple" regions and compromising physical fidelity. 2. The paper completely omits FLOPs and accuracy comparison
1. The authors present a strong and well-motivated problem statement, clearly identifying the inefficiency arising from uniform computation in neural operators. 2. SBR is an appropriate and well-designed approach that effectively addresses the problem by introducing adaptive token-level computation. 3. The experimental evaluation is extensive, covering a wide range of architectures and PDE benchmarks.
1. The proposed method does not always outperform or comparable to the baselines. In Table 1, its performance is lower than existing models in many cases, which raises questions about the robustness and consistency of the approach. 2. The analysis in Figure 6 seems limited, since the comparison relies on a narrow experimental setting.
- The motivation is clear, as in numerical computing, we should focus computational resources on features which are difficult to model, e.g. sharp gradients. - The method is simple and modular, easily being implemented within existing, widely used architectures. - Efficiency gains in terms of FLOPS and speed are clear. - Ablation studies, particularly with random routing and MoR, show that the token refinement strategy is actually helping to store useful information.
- I found issues with the presentation and clarity. There are quite a few typos and instances where the in-text citations are not formatted correctly. For example, \cite is used instead of \citep and there is no space between citations and other text. The exposition is also fairly dense and a bit repetitive, while concepts like physical intuition are buried in the details. Figures and tables aren't integrated into the narrative. - The authors state the importance map is static and fixed. Is this
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Reservoir Computing · Generative Adversarial Networks and Image Synthesis
