On Training Derivative-Constrained Neural Networks

KaiChieh Lo; Daniel Huang

arXiv:2310.01649·cs.LG·October 13, 2023

On Training Derivative-Constrained Neural Networks

KaiChieh Lo, Daniel Huang

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper introduces an integrated RELU activation function and stabilization techniques to improve the training of derivative-constrained neural networks, especially in physics-informed applications like quantum chemistry and SciML.

Contribution

It proposes IReLU activation and stabilization methods specifically designed for derivative-constrained neural networks, enhancing their training in scientific applications.

Findings

01

IReLU improves training stability for DC neural networks.

02

Denormalization and label rescaling enhance the incorporation of derivative constraints.

03

Methods outperform existing architectures in physics-informed tasks.

Abstract

We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- The paper tries to address an important challenge with PINNs, concerning the discrepancy between loss terms in physics-informed training. - The direction taken by authors in focusing on the units of derivatives and labels is interesting.

Weaknesses

- As also pointed out by the authors, the proposed IRELU activation has limited usability in physics-informed models, where one might need derivatives of an arbitrary order w.r.t. inputs, while derivatives for IRELU are $0$ for third and higher order derivatives. Even for second order PDEs, the effects of a constant second derivative of $1$ in IRELU (for $x>0$) need more attention and study. - Preventing vanishing and exploding gradients is one major characteristic of RELU. The gradient propaga

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

* The paper tackles an important problem of stabilizing training of derivative-constrained neural networks. * The writing of the paper is clear. * Experiments are conducted on a wide range of tasks and architectures.

Weaknesses

* The authors should introduce a name for their proposed method. * The ideas of the papers are quite incremental and mostly based on intuition. For example, the authors hypothesize that derivative-constrained NNs are sensitive to units without any theoretical/empirical evidence.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- **The paper addresses an important open problem and presents concrete improvements over a comprehensive set of experiments.** Training derivative constrained networks is central to solving a lot of the physics-related applications, and it can be difficult in practice since it is hard to balance the derivative constraint in the total loss function. This paper proposes methods that address this difficulty by respecting the different numerical scales of the system. Improvements across many differ

Weaknesses

- **The organization of the paper has room for improvement.** There is a motivation section in which the authors write about the experiments in detail, including the dataset size, numerical scales and units of the quantities of interest, but these details do not directly contribute to motivating the proposed methods. - **Some observations are left unexplained/unexplored.** In section 3 results, the authors observed that the "energy loss divided by 1000 is typically much lower than the force los

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Computational Physics and Python Applications