Scaling physics-informed hard constraints with mixture-of-experts

Nithin Chalapathi; Yiheng Du; Aditi Krishnapriyan

arXiv:2402.13412·cs.LG·February 22, 2024·3 cites

Scaling physics-informed hard constraints with mixture-of-experts

Nithin Chalapathi, Yiheng Du, Aditi Krishnapriyan

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a scalable Mixture-of-Experts approach to enforce hard physical constraints in neural networks, significantly improving accuracy, stability, and efficiency in modeling complex physical systems.

Contribution

It develops a novel scalable method using MoE to impose hard physics constraints efficiently across decomposed domains in neural networks.

Findings

01

Achieves higher accuracy in neural PDE solvers.

02

Reduces computation time during training and inference.

03

Enhances training stability for complex dynamical systems.

Abstract

Imposing known physical constraints, such as conservation laws, during neural network training introduces an inductive bias that can improve accuracy, reliability, convergence, and data efficiency for modeling physical dynamics. While such constraints can be softly imposed via loss function penalties, recent advancements in differentiable physics and optimization improve performance by incorporating PDE-constrained optimization as individual layers in neural networks. This enables a stricter adherence to physical constraints. However, imposing hard constraints significantly increases computational and memory costs, especially for complex dynamical systems. This is because it requires solving an optimization problem over a large number of points in a mesh, representing spatial and temporal discretizations, which greatly increases the complexity of the constraint. To address this…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

**Novelty:** I am not in a position to judge the novelty of the paper as there is very little overlap with my work. From the related work mentioned in the paper, the paper seems novel. **Significance:** The paper is of high interest to a subset of the ML community.

Weaknesses

**Clarity:** The paper is mostly quite clear. Personally, I would have benefitted from an ongoing example, where it is made clear what the outputs of the neural networks are, what the constraints are, and how the problem is divided in that case. Even better, it would have been nice to use the same example to move from one constraint to multiple ones.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The paper proposes a new way to decompose the complex dynamical systems with a large number of points in the spatiotemporal domain to smaller solvable systems, by utilizing the mixture-of-experts method. It makes the system scalable and performant (with parallel computing). - In the 2 test cases provided in the paper, the paper’s method has higher accuracy and lower time cost compared with other two methods.

Weaknesses

The experiments are relatively limited - the paper only tests on 2 cases, one for 1D and another for 2D. In each experiment, only one set of environment parameters (e.g.,, only Reynolds number = 1e4 is used for the Navier-Stokes case) are tested.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The paper is well written and easy to follow. The method takes inspiration from the domain decomposition which is standard in the the literature for solving Partial Differential Equations. I appreciated the details in the Methods section, particularly the explanations on the forward and backward pass of the architecture. The method clearly outperforms the existing soft- and hard-constrained baselines. The claim of the paper to the scaling is well supported with a solid inference time analysis.

Weaknesses

Unless I am mistaken, there is no clear formulation of the output function $u(x, t)$ at inference except in Figure 1, and if I understand the figure correctly, then $u(x, t) = \sum_k b(x,t)^T w_k =b(x,t)^T( \sum_k w_k)$. In this case, the sum of $w_k$ is the weights used to query the function over the domain $\Omega$, and as $w \neq w_k$ a priori, we do not know if the constraints are hardly imposed on any sampled points. Therefore, at inference I do not think that the PDE can be constrained in

Code & Models

Repositories

ask-berkeley/physics-nns-hard-constraints
jaxOfficial

Videos

Scaling physics-informed hard constraints with mixture-of-experts· slideslive

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Multi-Objective Optimization Algorithms