Astral: training physics-informed neural networks with error majorants

Vladimir Fanaskov; Tianchi Yu; Alexander Rudikov; Ivan Oseledets

arXiv:2406.02645·physics.comp-ph·March 3, 2026·1 cites

Astral: training physics-informed neural networks with error majorants

Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Astral, a novel loss function for physics-informed neural networks that directly estimates error bounds, enabling more reliable training and error assessment compared to traditional residual minimization methods.

Contribution

The paper proposes a new error majorant-based loss function, Astral, for physics-informed neural networks, improving error estimation and convergence speed over residual-based approaches.

Findings

01

Astral loss provides reliable error estimates with tight bounds.

02

Astral loss leads to faster convergence and lower errors.

03

Error estimates from Astral are better correlated with actual error.

Abstract

The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired accuracy is reached. We call loss function associated with error majorant \textbf{Astral}: neur\textbf{A}l a po\textbf{ST}erio\textbf{R}i function\textbf{A}l \textbf{L}oss. To compare Astral and residual loss functions, we illustrate how error majorants can be derived for various PDEs and conduct experiments with diffusion equations (including anisotropic and in the L-shaped domain), convection-diffusion equation, temporal discretization of Maxwell's equation, magnetostatics and nonlinear…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

- Bounding of new losses are clearly and consistently derived. - Clear and pedagogical presentation.

Weaknesses

- The proposed solutions are specific to each equation, which limits the applicative scope of the results. - Empirical results are not sufficiently convincing. Only a few examples show an improvement in errors, but in most cases this remains of the same order of magnitude. It is therefore unclear whether these improvements are significant, or simply the result of statistical or numerical fluctuations. - As has already been shown in several series of works, what affects the lack of convergence of

Reviewer 02Rating 5Confidence 4

Strengths

- Moving away from strong formulations to first-order systems (the error majorants presented are not exactly first order systems, but relatively close) seems like a good idea. - Good error estimators are useful in practice.

Weaknesses

- Novelty: Although the proposed loss functions are not exactly first-order system reformulations of the considered PDEs, they share a similar spirit -- no second derivatives are needed but instead auxiliary variables are introduced. However, first-order system formulations are not novel, not even for neural network based solution methods for PDEs, see for example the works of [Cai, arXiv:1911.02109] or [Schwab, arXiv:2409.20264]. So I believe it is crucial to understand if the advantages of the

Reviewer 03Rating 6Confidence 4

Strengths

The idea of training PINNs with the error majorant is a novel idea that, at least for some PDE classes, appears to be quite promising. The paper is generally well written and, for the most part, quite accessible. I particularly commend the intuitive motivation in Section 2.1, and the fact that a large number of PINNs (100) were trained for each setting, thus ruling out random effects. Some readers may consider the fact that the majorant must be derived/available for the considered PDE class as a

Weaknesses

At some parts, the paper is not well written and somewhat unclear. For example, I think the authors switch between $\phi$ in Section 3, which is a general notation for the solution of the PDE, and problem-specific notation ($\phi$, $B$, $u$, etc.) in Section 4. In Section 5, the solution for diffusion is given as $u$, while in Section 4 it is given as $\phi$ (if I understand correctly). In Section 2.3, the surrogate function (flux) is denoted as $\tilde{F}$, while I think this corresponds to $\o

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks

MethodsDiffusion