Ensemble and Mixture-of-Experts DeepONets For Operator Learning
Ramansh Sharma, Varun Shankar

TL;DR
This paper introduces ensemble and mixture-of-experts DeepONet architectures that enhance expressivity and spatial locality, achieving significantly lower errors in operator learning tasks involving PDEs.
Contribution
The paper proposes novel ensemble and PoU-MoE DeepONet architectures with universal approximation properties, improving accuracy and spatial modeling in operator learning.
Findings
Ensemble DeepONets achieve 2-4x lower errors than standard models.
PoU-MoE DeepONet promotes spatial locality and sparsity.
Both architectures are proven to be universal approximators.
Abstract
We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative errors than standard DeepONets and…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. Integrating the mixture-of-experts paradigm into operator learning enhances the model's capacity for effectively learning operators. 2. Building on the mixture-of-experts approach, the authors introduce a partition-of-utility strategy to encourage spatial locality and promote model sparsity. 3. A MoE model enhances the model's capacity by incorporating a diverse set of basis functions, which enables improved approximation accuracy across various operator learning tasks. 4. Comprehensive ablat
While the paper claims to present a novel framework combining classical expert neural networks, it largely repackages existing concepts rather than introducing groundbreaking ideas. The mixture-of-experts (MoE) paradigm is a well-established approach in machine learning, although innovative when applied in scientific ML, does not fundamentally transform the field. The combination of these elements lacks a compelling new problem formulation or a significant shift in methodology. The quality of th
- The partition-of-unity mixture-of-experts (PoU-MoE) trunk introduces a novel approach that enhances spatial locality and promotes model sparsity. - As reported, the ensemble DeepONets, particularly the POD-PoU variant, achieve substantial error reductions (2-4x) compared to standard DeepONets. - Universal approximation capabilities are analyzed.
- The presentation could be improved. For instance, a clear description and detailed experimental setup for each baseline in Table 1 should be provided. - The scope of this work appears somewhat limited, as it primarily focuses on testing enrichment strategies for basis functions within the specific context of operator learning. Applying these strategies to other popular frameworks is not trivial. Although the authors suggest that these methods might extend to FNO, sufficient details and evidenc
1. The paper presents its technical content in a clear, organized manner. 2. The integration of partition-of-unity principles into the DeepONet framework represents an interesting approach to incorporating spatial locality. 3. The work combines theoretical analysis (universal approximation theorems) with systematic empirical validation. 4. The proposed method demonstrates consistent performance improvements over standard DeepONets across multiple PDE examples.
1. The experimental comparisons are primarily focused on DeepONet variants, omitting comparisons with other popular neural operators like FNO, which would provide broader context for the method's effectiveness. 2. The time-dependent PDE examples are restricted to single-step predictions (from one time point to another), leaving open questions about the method's capability to learn full temporal trajectories when time coordinates are included in the trunk network inputs. 3. The dependence on p
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Seismology and Earthquake Studies
