Ensemble and Mixture-of-Experts DeepONets For Operator Learning

Ramansh Sharma; Varun Shankar

arXiv:2405.11907·cs.LG·March 18, 2025

Ensemble and Mixture-of-Experts DeepONets For Operator Learning

Ramansh Sharma, Varun Shankar

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces ensemble and mixture-of-experts DeepONet architectures that enhance expressivity and spatial locality, achieving significantly lower errors in operator learning tasks involving PDEs.

Contribution

The paper proposes novel ensemble and PoU-MoE DeepONet architectures with universal approximation properties, improving accuracy and spatial modeling in operator learning.

Findings

01

Ensemble DeepONets achieve 2-4x lower errors than standard models.

02

PoU-MoE DeepONet promotes spatial locality and sparsity.

03

Both architectures are proven to be universal approximators.

Abstract

We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative $ℓ_{2}$ errors than standard DeepONets and…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 4

Strengths

1. Integrating the mixture-of-experts paradigm into operator learning enhances the model's capacity for effectively learning operators. 2. Building on the mixture-of-experts approach, the authors introduce a partition-of-utility strategy to encourage spatial locality and promote model sparsity. 3. A MoE model enhances the model's capacity by incorporating a diverse set of basis functions, which enables improved approximation accuracy across various operator learning tasks. 4. Comprehensive ablat

Weaknesses

While the paper claims to present a novel framework combining classical expert neural networks, it largely repackages existing concepts rather than introducing groundbreaking ideas. The mixture-of-experts (MoE) paradigm is a well-established approach in machine learning, although innovative when applied in scientific ML, does not fundamentally transform the field. The combination of these elements lacks a compelling new problem formulation or a significant shift in methodology. The quality of th

Reviewer 02Rating 3Confidence 3

Strengths

- The partition-of-unity mixture-of-experts (PoU-MoE) trunk introduces a novel approach that enhances spatial locality and promotes model sparsity. - As reported, the ensemble DeepONets, particularly the POD-PoU variant, achieve substantial error reductions (2-4x) compared to standard DeepONets. - Universal approximation capabilities are analyzed.

Weaknesses

- The presentation could be improved. For instance, a clear description and detailed experimental setup for each baseline in Table 1 should be provided. - The scope of this work appears somewhat limited, as it primarily focuses on testing enrichment strategies for basis functions within the specific context of operator learning. Applying these strategies to other popular frameworks is not trivial. Although the authors suggest that these methods might extend to FNO, sufficient details and evidenc

Reviewer 03Rating 5Confidence 4

Strengths

1. The paper presents its technical content in a clear, organized manner. 2. The integration of partition-of-unity principles into the DeepONet framework represents an interesting approach to incorporating spatial locality. 3. The work combines theoretical analysis (universal approximation theorems) with systematic empirical validation. 4. The proposed method demonstrates consistent performance improvements over standard DeepONets across multiple PDE examples.

Weaknesses

1. The experimental comparisons are primarily focused on DeepONet variants, omitting comparisons with other popular neural operators like FNO, which would provide broader context for the method's effectiveness. 2. The time-dependent PDE examples are restricted to single-step predictions (from one time point to another), leaving open questions about the method's capability to learn full temporal trajectories when time coordinates are included in the trunk network inputs. 3. The dependence on p

Code & Models

Repositories

rsmath/ensemble-deeponet
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Seismology and Earthquake Studies