Morephy-Net: An Evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed Neural Operator Learning Networks
Binghang Lu, Changhong Mou, Guang Lin

TL;DR
Morephy-Net introduces an evolutionary multi-objective optimization framework combined with replica-exchange dynamics to improve physics-informed neural operator learning, especially under noisy data, by balancing multiple objectives and quantifying uncertainty.
Contribution
It presents a novel integration of multi-objective optimization and replica-exchange sampling to enhance robustness and accuracy in physics-informed neural operators for PDEs.
Findings
Improved accuracy over baseline models
Enhanced robustness to noisy data
Reliable uncertainty quantification
Abstract
We propose an evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed operator-learning Networks (Morephy-Net) to solve parametric partial differential equations (PDEs) in noisy data regimes, for both forward prediction and inverse identification. Existing physics-informed neural networks and operator-learning models (e.g., DeepONets and Fourier neural operators) often face three coupled challenges: (i) balancing data/operator and physics residual losses, (ii) maintaining robustness under noisy or sparse observations, and (iii) providing reliable uncertainty quantification. Morephy-Net addresses these issues by integrating: (i) evolutionary multi-objective optimization that treats data/operator and physics residual terms as separate objectives and searches the Pareto front, thereby avoiding ad hoc loss weighting; (ii) replica-exchange stochastic gradient…
| Acronym | Physics-informed | Uncertainty Quantification | Fourier Feature |
|---|---|---|---|
| DeepONet | – | – | – |
| PI-DON | Yes | – | – |
| PI-FDON | Yes | – | Yes |
| Morephy-Net | Yes | Yes | Yes |
| Method | Wall Time (s) | RE |
|---|---|---|
| DeepONet | 270 | 0.0550 |
| Morephy-Net | 450 | 0.0436 |
| Model | Residual Loss | IBC Loss | RE |
|---|---|---|---|
| DeepONet | N/A | 0.0008 | 0.0550 |
| PI-DON | 0.0007 | 0.0003 | 0.0510 |
| PI-FDON | 0.0004 | 0.0003 | 0.0446 |
| Morephy-Net | 0.0002 | 0.0001 | 0.0436 |
| Model | Residual | IBC loss | Data loss | Predicted | RE |
|---|---|---|---|---|---|
| DeepONet | N/A | 0.0007 | 0.0009 | 0.0161 | 0.1174 |
| PI-DON | 0.0010 | 0.0043 | 0.0015 | 0.0098 | 0.0700 |
| PI-FDON | 0.0008 | 0.0026 | 0.0008 | 0.0073 | 0.0560 |
| Morephy-Net | 0.0003 | 0.0005 | 0.0001 | 0.0062 | 0.0358 |
| Model | Residual Loss | IBC loss | Data loss | Predicted | RE |
|---|---|---|---|---|---|
| DeepONet | N/A | 0.0068 | 0.5762 | 0.0086 | 0.1421 |
| PI-DON | 0.0084 | 0.0042 | 0.3461 | 0.0078 | 0.1260 |
| PI-FDON | 0.0074 | 0.0026 | 0.4653 | 0.0073 | 0.1249 |
| Morephy-Net | 0.0043 | 0.0021 | 0.2713 | 0.0063 | 0.0744 |
| Model | Residual loss | BC loss | RE |
|---|---|---|---|
| DON | 0.0903 | 0.0173 | 0.3080 |
| PI-DON | 0.0703 | 0.0153 | 0.1620 |
| PI-FDON | 0.0539 | 0.0148 | 0.1319 |
| Morephy-Net | 0.0488 | 0.0121 | 0.0955 |
| Model | Residual Loss | BC Loss | Data Loss | Predicted | RE |
|---|---|---|---|---|---|
| DON | 0.0641 | 0.0275 | 0.0503 | 0.6930 | 0.2298 |
| PI-DON | 0.0541 | 0.0148 | 0.0499 | 0.6640 | 0.1813 |
| PI-FDON | 0.0509 | 0.0151 | 0.0458 | 0.5954 | 0.1359 |
| Morephy-Net | 0.0236 | 0.0148 | 0.0384 | 0.5740 | 0.1134 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Morephy-Net: An Evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed Neural Operator Learning Networks
Binghang Lu
School of Electrical and Computer Engineering
Purdue University
West Lafayette, IN 47907, USA
&Changhong Mou
Department of Mathematics and Statistics
Utah State University
Logan, UT 84322, USA
&Guang Lin
Department of Mathematics & School of Mechanical Engineering
Purdue University
West Lafayette, IN 47907, USA
[email protected] Corresponding author
Abstract
We propose an evolutionary Multi-objective Optimization for Replica-Exchange-based Physics-informed operator-learning Networks (Morephy-Net) to solve parametric partial differential equations (PDEs) in noisy data regimes, for both forward prediction and inverse identification. Existing physics-informed neural networks and operator-learning models (e.g., DeepONets and Fourier neural operators) often face three coupled challenges: (i) balancing data/operator and physics residual losses, (ii) maintaining robustness under noisy or sparse observations, and (iii) providing reliable uncertainty quantification. Morephy-Net addresses these issues by integrating: (i) evolutionary multi-objective optimization that treats data/operator and physics residual terms as separate objectives and searches the Pareto front, thereby avoiding ad hoc loss weighting; (ii) replica-exchange stochastic gradient Langevin dynamics to enhance global exploration and stabilize training in non-convex landscapes; and (iii) Bayesian uncertainty quantification obtained from stochastic sampling. We validate Morephy-Net on representative forward and inverse problems, including the one-dimensional Burgers equation and the time-fractional mixed diffusion–wave equation. The results demonstrate consistent improvements in accuracy, noise robustness, and calibrated uncertainty estimates over standard operator-learning baselines.
K****eywords Physics-informed Operator Learning Evolutionary Multi-Objective Optimization Uncertainty Quantification Replica Exchange Stochastic Gradient Langevin Dynamics Parametric PDEs Inverse Problem
1 Introduction
The accurate and efficient solution of partial differential equations (PDEs) is indispensable in modeling and understanding a wide range of physical, biological, and engineering systems [17, 58, 19]. Solving parametric PDEs with varying geometries, parameters, and initial/boundary conditions is equivalent to learning the solution operator that maps inputs to their corresponding solutions [25, 31, 21]. For complex systems, traditional methods such as the finite difference method (FDM) and finite element method (FEM) often require prohibitive computational resources as they need fine spatial–temporal discretization and repeated simulations for different parameters and initial/boundary conditions [60, 30, 1, 59, 31]. In recent years, machine learning (ML) has emerged as a powerful and increasingly popular alternative to traditional numerical solvers for parametric PDEs [2, 53, 43, 49, 26, 20, 61, 46]. There are several different ML approaches for solving PDEs. Among them, one prominent category is physics-informed neural networks (PINNs) [54], which incorporate the underlying physical laws directly into the training process. PINNs have shown success in approximating solutions to specific PDE instances and have been extended to learn solution operators that generalize across varying inputs (i.e., parametric PDEs). However, training PINNs for parametric PDEs remains computationally expensive, particularly in multi-query or real-time scenarios, and this setting often requires overly complex parameterizations [64, 28]. Another alternative is operator learning, such as the Deep Operator Network (DeepONet or DON) [43] and Fourier Neural Operator (FNO) [33], which aim to directly learn the mapping from input functions to PDE solutions. However, their performance often deteriorates in challenging settings, including inverse problems, noisy or sparse observational data, and high-dimensional parameter spaces [27].
To address the limitations of traditional operator learning, Wang et al. [65] proposed a physics-informed DeepONet framework for parametric PDEs that combines the operator-learning capability of DeepONets with PDE residual constraints, as in PINNs. This framework reduces the dependence of general operator learning on large amounts of paired high-resolution training data and preserves consistency to the original PDEs with different discretizations. In particular, physics-informed DeepONets adopt a composite loss function consisting of: (1) an operator loss, and (2) a physics loss derived from the PDE residual. [65] showed that physics-informed DeepONets are data-efficient and particularly effective in small-data regimes. Nevertheless, several challenges inherent to the physics-informed DeepONets framework persist. First, similar to PINNs, the training of physics-informed DeepONets admits a multi-objective optimization problem, where the operator loss and the physics loss from the PDE residual must be balanced appropriately [42, 41]. Second, in real applications, the available data are often contaminated with noise. Such noise can diminish the accuracy of the learned operator and even destabilize the training process, particularly in inverse problem settings where the physics loss alone may be insufficient to adequately regularize the solution [27, 12]. Third, since operator learning optimization often relies on gradient-based methods such as the Adam optimizer, the current physics-informed DeepONet framework lacks the ability for uncertainty quantification [69, 73].
To address these challenges, we propose a new Multi-Objective Replica-Exchange Physics-Informed neural-operator Network (Morephy-Net). The novelty of Morephy-Net is to unify (a) principled multi-objective training, (b) replica-exchange posterior sampling, and (c) uncertainty quantification within a single physics-informed operator-learning framework. Concretely, Morephy-Net introduces: (i) an evolutionary multi-objective optimizer that treats the operator/data loss and physics residual loss as separate objectives and searches for Pareto-optimal trade-offs, eliminating manual loss weighting; (ii) replica-exchange stochastic gradient Langevin dynamics (reSGLD) as an efficient sampler to enhance global exploration and reduce the risk of converging to poor local minima; and (iii) built-in uncertainty quantification (UQ) based on the resulting stochastic samples, enabling uncertainty-aware predictions for both forward and inverse problems. Morephy-Net adopts a modified reference-point-based non-dominated sorting method (refined NSGA-III) [7] and further improves diversity along the Pareto front via the proposed refined Pareto sampling (RPS) strategy. By promoting diverse Pareto solutions during training, RPS mitigates Pareto-front clustering and yields more stable trade-offs between data fidelity and physical consistency. The reSGLD component is designed to improve parameter-space exploration. Standard stochastic gradient Langevin dynamics (SGLD) [68] augments stochastic gradient descent with Gaussian noise to approximate posterior sampling during optimization; however, in high-dimensional non-convex landscapes it can still become trapped in local minima. In such cases, reSGLD [9, 10, 72, 34] runs multiple SGLD processes at different temperatures and periodically exchanges their states. High-temperature chains explore broader regions of the energy landscape and facilitate escape from local traps, while low-temperature chains focus on fine-tuning around promising optima. This mechanism enables Morephy-Net to converge faster, deliver more reliable parameter estimates, and quantify uncertainty for both forward and inverse PDE problems. We summarize the main contributions of this work as follows:
Morephy-Net incorporates evolutionary multi-objective optimization into physics-informed operator learning via NSGA-III, treating the data and operator loss and the physics residual loss as distinct objectives and thereby avoiding manual loss weighting. 2. 2.
We propose a refined Pareto sampling (RPS) strategy within NSGA-III to increase sampling diversity by re-selecting distinctive solutions along the Pareto front and mitigating local clustering. 3. 3.
We employ replica-exchange stochastic gradient Langevin dynamics (reSGLD) to enhance global exploration of the parameter space: high-temperature chains traverse broader energy landscapes to escape local minima, while low-temperature chains refine solutions; together, this accelerates convergence and improves accuracy. 4. 4.
The stochastic sampling in reSGLD naturally enables uncertainty quantification (UQ) for physics-informed operator learning in both forward and inverse PDE settings.
The remainder of this paper is organized as follows: Section 2 describes the proposed Morephy-Net framework in detail that includes its main components: physics-informed DeepONet (Section 2.2), NSGA-III multi-objective optimization (Section 2.3), Replica Exchange Stochastic gradient Langevin dynamics (Section 2.4), and its uncertainty quantification (Section 2.5). Section 3 presents numerical experiments on different test problems. Finally, Section 4 concludes the paper and discusses potential future directions.
2 General framework
2.1 Overview
The Multi-objective Optimization for Replica-Exchange-based Physics-informed operator-learning Networks (Morephy-Net)framework consists of several key components, as illustrated schematically in Figure 2. In particular, Morephy-Net consists of the following: (1) Physics-informed Deep Operator Network (PI-DON) architecture. The Deep Operator Network employs two subnetworks—the branch network that includes the initial conditions or PDE parameters, and the trunk network, which may contain the spatial temporal grid information. The training is guided not only by data loss but also by a physics-informed residual loss that enforces the governing PDE constraints (see Section 2.2). (2) Fourier convolution layer. The Fourier convolution layer maps the latent representation into a spectral space, where the underlying information can be more effectively represented by the neural network. This transformation also allows the model to capture global dependency and long-range interactions inherently in data. (3) Multi-objective optimization with replica exchange stochastic gradient Langevin dynamics. In Morephy-Net, the training process integrates the refined Non-dominated Sorting Genetic Algorithm III (refined NSGA-III) algorithm (see Section 2.3), proposed in this paper, with replica exchange stochastic gradient Langevin dynamics (reSGLD) (see Section 2.4) to address the challenges of multi-objective optimization in physics-informed operator learning. The refined NSGA-III serves as the evolutionary mechanism for exploring the Pareto front that produces optimal candidate solutions. These candidates are then further optimized through reSGLD, which improves exploration of the parameter space by including stochastic gradient noise and enabling replica exchanges across parallel chains at different “temperatures.” This stochastic strategy not only helps the models escape local minima but also improves convergence in training.
2.2 Physics Informed Deep Operator Network
The general deep operator network (DeepONet) proposed [43] with mathematical foundation of the universal operator approximation theorem [4] is intended to learn nonlinear operator mappings between infinite-dimensional function spaces. For a general PDE problem, we introduce the solution operator
[TABLE]
where represents the input, which can be the initial condition or parameters of the PDE. A DeepONet is a class of neural network models designed to approximate . In particular, we denote by the DeepONet approximation of , with representing the trainable parameters of the network. A DeepONet consists of two subnetworks—the branch net and the trunk net—and the estimator evaluated at is an inner product of the branch and trunk outputs:
[TABLE]
where is a bias, are the outputs of the branch net, and are the outputs of the trunk net. More specifically, the branch network represents in a discrete format:
[TABLE]
where is a vector that consists of evaluations of the input function at sensor locations . The trunk network may take as input a spatial location . Our goal is to predict the solution corresponding to a given , which is also evaluated at the same input location . The representation of through pointwise evaluations at arbitrary sensor locations provides additional flexibility, both during training and in prediction, particularly when is available only through its values at these sensor points.
Figure 1 outlines the Morephy-Net approach for learning solution operators from noisy and sparse data. The framework addresses common limitations of conventional methods (e.g., loss imbalance and poor exploration) by integrating three components: (A) evolutionary multi-objective optimization (EMO) to adaptively balance data/operator and physics losses along the Pareto front; (B) replica exchange stochastic gradient Langevin dynamics (RE-SGLD) to escape local minima and improve global exploration; and (C) Bayesian uncertainty quantification (UQ) via stochastic sampling. As demonstrated on benchmark problems such as the 1D Burgers and fractional diffusion–wave equations, Morephy-Net yields a robust operator-learning model with improved accuracy, stronger noise resilience, and reliable uncertainty estimates.
Given the training data with labels in the two sets of inputs and outputs:
[TABLE]
we minimize a loss function that measures the discrepancy between the true solution operator and its DeepONet approximation :
[TABLE]
where denotes the set of evaluation points in the domain of . In practice, the analytical solution of the PDE is usually unavailable. Instead, we rely on numerical simulations from high-fidelity solvers, denoted by
[TABLE]
Accordingly, in practice we minimize the loss function
[TABLE]
The Physics-Informed Deep Operator Network (PI-DeepONet or PI-DON; referred to as PI-DON in this paper) [65] integrates the idea of physics-informed neural networks (PINN) [54] with the DeepONet framework. The PI-DON introduces the additional PDE residual loss function and boundary condition loss function. Consequently, the loss function for PI-DON is defined as
[TABLE]
where denotes the PDE residual loss, the boundary condition loss, and and are the corresponding weights for each term.
2.3 Multi-objective Optimization
To obtain an optimal PI-DON, i.e., to minimize the composite loss in (9), we must solve a multi-objective optimization problem. Multi-objective optimization problems frequently arise in scientific and engineering applications [62, 23, 32, 55, 51, 50, 66, 57], many of which involve high-dimensional search spaces. In this paper, we propose a refined version of the nondominated sorting genetic algorithm (refined NSGA-III) that builds upon the classical NSGA-III framework originally introduced by [7]. The refined NSGA-III retains the population-based evolutionary structure designed for multi-objective optimization while incorporating additional mechanisms to improve sampling diversity in the operator learning settings.
2.3.1 Nondominated Sorting Genetic Algorithm (NSGA-III)
To illustrate the NSGA-III algorithm, we begin with the following general setting. Given , an -objective function is defined as , where and for a given search space . Unlike single-objective optimization, there is usually no single solution that minimizes all objectives simultaneously. For comparison, we say that a solution dominates a solution , denoted , if and only if for all . A solution is Pareto-optimal if it is not strictly dominated by any other solution. The set of objective values of Pareto-optimal solutions is called the Pareto front. The NSGA-III algorithm, originally proposed by Deb and Jain [6], belongs to the evolutionary optimization framework and incorporates external mechanisms to support diversity maintenance, which can also help reduce the computational burden. Instead of exhaustively exploring the entire search space for Pareto-optimal solutions, the algorithm initiates multiple predefined, targeted searches. More specifically, NSGA-III initializes with a random population of size . In each subsequent iteration, one generates an offspring population of size using mutation and/or crossover operators. With a fixed population size, NSGA-III selects individuals for the next iteration from the combined pool of candidates. Since non-dominated solutions are used, a ranking scheme is employed in which dominance serves as the primary criterion for survival. Individuals that are not strictly dominated by any other member of the population are assigned rank 1. Subsequent ranks are determined recursively: each unranked individual that is strictly dominated only by those with ranks is assigned rank . Intuitively, the lower an individual’s rank, the more interesting it is. Denote be the set of individuals with rank , and let be the smallest integer such that
[TABLE]
All individuals with rank at most are retained for the next generation. In addition, individuals of rank must be selected so that the new population remains , allowing the next iteration to proceed.
2.3.2 Refined Nondominated Sorting Genetic Algorithm (RNSGA-III)
The vanilla NSGA-III algorithm (Algorithm 1), when applied to operator learning for PDE problems, aims to minimize the multi-objective loss in (9). In principle, NSGA-III does not inherently “contract” samples to a local region of the front; in practice, however, it is common to observe that the non-dominated set concentrates on a local sub-arc (or sub-face) of the attainable front. This phenomenon is not a theoretical property of the algorithm, but typically arises from several factors: (i) a mismatch between the fixed reference directions and the shape/location of the optimal Pareto front, such that many directions do not intersect the feasible front [24, 37]; (ii) poor objective normalization or strong scale differences, which are common in operator-learning settings, introducing bias into the perpendicular-distance association step [35]; (iii) high correlation among objectives (e.g., and ) reducing the effective dimensionality of the front [56]; and (iv) feasibility/constraint handling overpowering diversity pressure, causing selection to repeatedly favor a small subset of niches [71, 18]. Stochastic mini-batching of residual and boundary losses can further amplify these effects because the resulting noise may affect both dominance checks (comparisons that determine whether one solution is better than another) and association steps (assigning solutions to reference directions or niches). Figure 3 illustrates such a case, where vanilla NSGA-III produces sampling points (green) concentrated within a local sub-face of the attainable Pareto front.
To overcome this limitation, we propose refined NSGA-III, which integrates a refined Pareto sampling (RPS) strategy into the original NSGA-III framework. At each iteration, RPS first identifies solutions with sufficiently distinct objective values and then selects a representative subset based on pairwise distances between their outputs. The details of RPS are given in Algorithm 2. The red points in Figure 3 show that RPS successfully disperses solutions across the Pareto surface, in contrast to the vanilla NSGA-III sampling (green points).
2.4 Replica Exchange Stochastic Gradient Langevin dynamics
Replica exchange stochastic gradient Langevin dynamics (reSGLD) [8, deng2022adaptively, 11] extends the classical stochastic gradient Langevin dynamics (SGLD) [67] by incorporating the replica exchange mechanism from statistical physics. While SGLD is a scalable Markov Chain Monte Carlo (MCMC) method that interpolates between stochastic optimization and posterior sampling as the step size decreases, its efficiency is often limited by the pathological curvature of the underlying energy landscape that may lead to local trapping. To reconcile this issue, reSGLD employs multiple diffusion processes at different temperatures and operates occasional swaps which can enable high-temperature replicas to traverse energy barriers and facilitate exploration across distinct local modes. This mechanism which combines the scalability of SGLD with the efficiency of replica exchange leads to accelerated convergence, which has been both theoretically established [15, 13] and empirically validated [8].
To illustrate the reSGLD algorithm, we first recall the negative log-posterior (energy function) given by
[TABLE]
where is the prior and the second term corresponds to the complete-data log-likelihood. For large datasets, direct evaluation is expensive and sometimes prohibitive. A common remedy is to approximate it using a mini-batch , which yields an unbiased stochastic energy estimator:
[TABLE]
The stochastic gradient Langevin dynamics (SGLD) iterates as
[TABLE]
where is the learning rate and is a standard Gaussian noise. As , SGLD converges to the invariant distribution . A naive extension of replica exchange Langevin dynamics (reLD) to the stochastic gradient setting leads to the following two-chain system:
[TABLE]
where are different temperatures. Replica exchange is then attempted by swapping the two chains with a naive stochastic swapping rate
[TABLE]
However, the stochastic estimates enter (15) through a nonlinear exponential, which introduces significant bias in the swap acceptance probability. This bias breaks detailed balance and results in poor approximation of the target distribution. To address this, a correction is introduced by modeling the stochastic energy difference as a Gaussian perturbation,
[TABLE]
where . This yields the corrected swap acceptance rate
[TABLE]
which is now an unbiased estimator of the exact acceptance probability. The correction term compensates for the variance of the mini-batch noise and restores the validity of the replica exchange procedure.
2.5 Uncertainty Quantification
Uncertainty quantification (UQ) is critical to ensure the reliability of machine learning models in scientific applications. The reSGLD mechanism inherent in the Morephy-Net framework provides a natural and efficient way to perform Bayesian UQ. By sampling the posterior distribution of the model parameters using reSGLD, we obtain an ensemble of model predictions that can be used to quantify uncertainty. Specifically, the predictive mean and variance are given by and variance where denotes the sample size and is the prediction from the th sampled model.
3 Numerical Results
In this section, we will show the numerical test results for the proposed Morephy-Net in both forward and inverse problem setting. Specifically, we consider two different test problems: the one-dimensional viscous Burgers’ equation and the one-dimensional time-fractional mixed diffusion-wave equations (TFMDWEs). The inverse problem settings are formulated to identify the viscosity parameter in the Burgers’ equation and the fractional derivative order in the TFMDWEs. To avoid ambiguity, we provide a summary in Table 1 that lists all the models used for comparison with Morephy-Net in the numerical tests, along with their full names.
3.1 Burgers’ Equation
The one-dimensional viscous Burgers’ equation is a nonlinear partial differential equation frequently used as a benchmark [44, 45, 54, 47]. The equation with Dirichlet boundary conditions, is defined on the spatial domain and temporal domain given by the following:
[TABLE]
Here, represents the solution over space and time, and is the viscosity chosen to be .
Remark 1*.*
The Morephy-Net employs an ensemble strategy, which can be computationally expensive. This overhead, however, can be alleviated through parallelization, as each ensemble member is trained independently. The number of tasks/actors launched equals the population size. Furthermore, the incorporation of the evolutionary multi-objective optimization algorithm (NSGA-III) accelerates the convergence of each ensemble member relative to a standard DeepONet. As a result, while the computational cost of Morephy-Net is higher, it is not prohibitive. Table 2 list the wall time for training DeepONet and the proposed Morephy-Net. The wall time is recorded until the model reaches a convergence plateau. All experiments were performed on a linux machine with an NVIDIA H100 GPU (80G).
3.1.1 Hyperparameters
To ensure a rigorous and unbiased comparison between DeepONet, PI-DON, and PI-FDON, we standardized the core architecture and training protocol across all models. For each network, both the branch and trunk sub-networks were constructed as Multi-Layer Perceptrons (MLPs) featuring a depth of 8 hidden layers and a width of 50 neurons per layer. The training regimen was also held constant for all experiments. We employed the Adam optimizer with a learning rate of 5e-4. To mitigate overfitting and improve training efficiency, an early stopping criterion was implemented, which terminated the training process if the validation loss did not demonstrate significant improvement over a predefined number of epochs.
3.1.2 Forward Problem
We solve the forward problem for the 1D viscous Burgers’ equation to evaluate the performance of four models: DeepONet, PI-DON, and PI-FDON. The goal is to predict the solution for the entire spatio-temporal domain , given the initial and boundary conditions. The visualization results are presented in Figure 4. As shown in first row LABEL:Burgers_3d_forward, all four models successfully learn the overall dynamics of the Burgers’ equation. The 3D spatiotemporal plots demonstrate that each model captures the smooth initial state and the subsequent formation of a steep shock front as time progresses. A more detailed and analysis is presented in second row LABEL:Burgers_forward_UQ_result presents solution snapshots at time and , revealing significant performance differences in resolving the shock front discontinuity. The prediction from Morephy-Net, shown as a blue shaded region representing the confidence interval, almost perfectly overlaps the benchmark solution. It captures the sharp shock wave with no visible overshooting or spurious oscillations, yielding a stable and physically plausible result. In the contrast, the other three models struggle to resolve the sharp gradient. As highlighted in the insets, these models exhibit non-physical oscillations near the shock, an issue that becomes more pronounced as the shock steepens over time. The error distributions are plotted in third row LABEL:Burgers_forward_result_error. The errors for DeepOnet, PI-DON, and PI-FDON are heavily concentrated along the trajectory of the shock wave. The error map for Morephy-Net is visibly lighter, indicating a lower error magnitude across the domain and reinforcing its superior performance in shock capturing. The quantitative performance of the models is summarized in Table 3 to clearly illustrate the model difference. As from our observation, the proposed PI-FDON shows higher accuracy than DON and PI-DON model. Crucially, the Morephy-Net model demonstrates the best performance overall. It better satisfies the residual constraint and boundary constraint and also achieves the lowest relative error.
3.1.3 Inverse Problem
We now focus on the inverse problem for Burgers’ equation, a more challenging task where the viscosity parameter, , is treated as an unknown variable. Our objectives are to infer the correct value of the viscosity parameter and accurately reconstruct the full continuous solution field over the entire spatial-temporal domain. This type of parameter inference poses a significant challenge for deep learning models. The complex and non-convex loss landscapes associated with inverse problems mean that the optimization process can easily become trapped in poor local minima, leading to inaccurate parameter estimates and solution reconstructions. We use the observational data serve as anchors during the training process, ensuring that the network remains consistent with the underlying physical dynamics. In the inverse problem, the total loss function is composed of three components: the PDE residual loss, the boundary and initial condition loss, and the data loss.
In Figure 5, the first row LABEL:Burgers_3d_inverse, shows the reconstruction result in the spatial-temporal solution field. The second row LABEL:Burgers_inverse_UQ_result shows the snapshot of the result at three distinct timesteps. As shown in this figure, the prediction of DeepONet shows the most severe oscillations, above the benchmark at and but below the benchmark at . PI-DON and Pi-FDON also fail to produce a clean, stable prediction. However, Morephy-Net demonstrates best performance in this scenario. The blue shaded region shows the confidence interval and it almost perfectly overlaps the black benchmark line. It has a great improvement when compare with the other three models. The third row LABEL:Burgers_inverse_result_error displays a comparison of error heatmaps for four models. Each plot visualizes the absolute error between the model’s prediction and the true solution over a 2D domain of space and time . As we can observe from the figure that the Morephy-Net performs the best accuracy. the lighter bright line along the center indicates a lower error that is concentrated where the shock wave occurs.
3.1.4 Inverse Problem With Noisy Data
We now investigate a more challenging version of the inverse problem to assess model robustness. The sparse training data are corrupted by adding noise drawn from a Gaussian distribution with a zero mean. We introduce a substantial noise level by setting the noise’s standard deviation to 50% of the standard deviation of the true data, evaluating the models’ capacity to learn the governing physics from corrupted observations.
Figure 6 shows the reconstructed solution and error in the spatial-temporal domain. In the second row LABEL:Burgers_inverse_noisy_UQ_result, the Morephy-Net model shows good performance and reliability. The prediction of Morephy-Net is visualized as a blue shaded area, representing the 95% confidence interval in second row LABEL:Burgers_inverse_noisy_UQ_result. As clearly shown in the zoomed in figure, this confidence interval covers black benchmark line from to 0. In the contrast, the other three models fail to accurately capture the shock. Their point predictions exhibit significant deviations and spurious oscillations near the discontinuity. The spatial-temporal error distribution for four models are shown in the third row LABEL:Burgers_inverse_noisy_result_error. The Morephy-Net error map is lighter than others, indicates a substantially lower prediction error across the entire space-time domain. Table 5 summarizes the results for the challenging inverse problem with noisy data, where Morephy-Net shows the most robust performance. It not only achieves the best fit to the physical and data constraints but also yields the most accurate parameter prediction (). Consequently, it achieves a significantly lower relative error of 0.0744 compare with Pi-FDON of 0.1249.
3.2 Time-fractional Mixed Diffusion-Wave Equations (TFMDWEs)
In this section, we consider the time-fractional mixed diffusion-wave equations (TFMDWEs), which generalize classical diffusion and wave equations by incorporating fractional-order time derivatives [39, 14, 40]. The TFMDWEs with Dirichlet boundary conditions are defined as follows:
[TABLE]
where the fractional order yields: and the forcing term is explicitly defined as:
[TABLE]
with denoting the Gamma function. This equation introduces fractional-order temporal dynamics, combining features of both diffusion and wave phenomena. The fractional order controls the transition between diffusive and wave-like behaviors, making TFMDWEs particularly useful in modeling complex systems exhibiting anomalous transport and non-local temporal interactions.
3.2.1 Hyperparameters
To ensure a fair comparison between DeepONet, PI-DON, and PI-FDON, we standardized the core architecture and training protocol across all models. For each network, both the branch and trunk sub-networks were constructed as Multi-Layer Perceptrons (MLPs) featuring a depth of 3 hidden layers and a width of 100 neurons per layer. The training regimen was also held constant for all experiments. We employed the Adam optimizer with a learning rate of 5e-4. The early stopping criterion was implemented to mitigate overfitting and improve training efficiency.
3.2.2 Forward Problem
This section details the performance evaluation of the four models—Morephy-Net, DeepOnet, PI-DON, and PI-FDON—on the forward problem. A comprehensive comparison against the benchmark solution is presented in Figure 7, which includes spatiotemporal field reconstructions, solution cross-sections, and quantitative error distributions. The collective results demonstrate the superior accuracy and reliability of the Morephy-Net architecture. Figure LABEL:FPDE_forward_result provides a comparison of the full spatiotemporal solution fields. The predictions generated by Morephy-Net, PI-DON, and PI-FDON are visually indistinguishable from the benchmark, indicating that they have successfully learned the underlying dynamics of the system. In contrast, the DeepOnet model exhibits a noticeable deviation, with a significant error concentrated near the initial condition at , suggesting a failure to correctly capture the initial state of the system. Figures LABEL:FPDE_forward_UQ_x_result and Figure LABEL:FPDE_forward_UQ_t_result display solution cross-sections at various fixed spatial locations () and temporal snapshots () respectively. The 95% confidence interval of the Morephy-Net prediction, represented by the blue shaded region, consistently envelops the ground truth benchmark line across all subplots. This demonstrates that the true solution lies within the model’s predicted uncertainty bounds. On the contrary, we cab observe the predictions result of other models are lower than the benchmark at fixed location shown in LABEL:FPDE_forward_UQ_x_result and predilections are deviate from the benchmark shown in LABEL:FPDE_forward_UQ_t_result. Figure LABEL:FPDE_forward_result_error shows the error heatmaps. The error map for Morephy-Net is visibly lighter in color across the entire spatiotemporal domain, indicating a significantly lower overall prediction error compared to the other models.
Table 6 demonstrates a consistent trend of improvement across the operator-learning architectures. Morephy-Net stands out as the most accurate model. It achieves the lowest relative error of 0.0935, a substantial improvement over the best-performing operator-learning baseline. It also records the lowest initial and boundary condition (IBC) loss, indicating superior satisfaction of the problem constraints.
3.2.3 Inverse Problem
We consider the inverse TFMDWE setting and train the model to infer the fractional order and reconstruct the solution. Figure 8 visualizes the results. In the first row, we compare the reconstructed spatiotemporal fields from different models with the benchmark. Morephy-Net matches the benchmark closely, whereas the other three models exhibit noticeable mismatches near and . The second and third rows show 2D projections of the solution at different fixed locations and times. These plots indicate that Morephy-Net consistently covers the benchmark within its 95% confidence interval (dark-blue shading). The fourth row shows the error heatmaps for the four models, where Morephy-Net exhibits smaller errors overall.
4 Conclusion and Future Work
In this paper, we introduced Morephy-Net, a physics-informed operator-learning framework that integrates evolutionary multi-objective optimization with replica-exchange stochastic sampling. By treating data/operator and physics residual losses as separate objectives and searching the Pareto front, Morephy-Net removes the need for hand-tuned loss weights. Meanwhile, reSGLD improves global exploration of the parameter space and enables principled Bayesian uncertainty quantification. Numerical experiments on the Burgers’ equation and the time-fractional mixed diffusion–wave equation demonstrate that Morephy-Net achieves higher accuracy and stronger robustness to noisy observations than standard operator-learning baselines (e.g., DeepONet, PI-DON, and PI-FDON), in both forward and inverse settings. In addition to improved point estimates, Morephy-Net provides calibrated uncertainty information—a capability that is typically absent in deterministic operator-learning pipelines—which is critical for reliable scientific inference and decision-making.
There are several promising future directions. First, extending the proposed multi-objective replica-exchange physics-informed setting to evolutionary Kolmogorov–Arnold networks [5, 36] may improve both expressiveness and interpretability. Second, Morephy-Net’s inherent uncertainty quantification (UQ) makes it a natural candidate for data assimilation (DA), where accurate state estimation under model uncertainty is crucial for high-dimensional dynamical systems [52, 29, 3, 16, 38, 63]. Since DA often requires many repeated forecasts in an ensemble or iterative framework, the operator-learning structure of Morephy-Net can serve as a computationally efficient surrogate for the underlying dynamics; combined with UQ, it is particularly well-suited for real-time assimilation with noisy observations. Third, applying Morephy-Net to turbulent flows governed by the Navier–Stokes equations (NSE) offers an exciting opportunity to test its robustness and scalability in highly nonlinear and multiscale systems [48, 22, 70].
Github codes
The source code for Morephy-Net is available at: https://github.com/jeremylu916/MORPHY-Net
Acknowledgment
We would like to thank the support of National Science Foundation (DMS-2533878, DMS-2053746, DMS-2134209, ECCS-2328241, CBET-2347401 and OAC-2311848), and U.S. Department of Energy (DOE) Office of Science Advanced Scientific Computing Research program DE-SC0023161, the SciDAC LEADS Institute, and DOE–Fusion Energy Science, under grant number: DE-SC0024583.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. C. Berselli, T. Iliescu, and W. J. Layton (2006) Mathematics of large eddy simulation of turbulent flows . Springer . Cited by: §1 .
- 2[2] S. L. Brunton and J. N. Kutz (2024) Promising directions of machine learning for partial differential equations . Nature Computational Science 4 ( 7 ), pp. 483–494 . Cited by: §1 .
- 3[3] N. Chen and A. J. Majda (2020) Efficient nonlinear optimal smoothing and sampling algorithms for complex turbulent nonlinear dynamical systems with partial observations . Journal of Computational Physics 410 , pp. 109381 . Cited by: §4 .
- 4[4] T. Chen and H. Chen (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems . IEEE Transactions on Neural Networks 6 ( 4 ), pp. 911–917 . Cited by: §2.2 .
- 5[5] G. Cruz, B. Renczes, M. C. Runacres, and J. Decuyper (2025) State-space kolmogorov arnold networks for interpretable nonlinear system identification . IEEE Control Systems Letters . Cited by: §4 .
- 6[6] K. Deb and H. Jain (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints . IEEE transactions on evolutionary computation 18 ( 4 ), pp. 577–601 . Cited by: §2.3.1 .
- 7[7] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan (2002) A fast and elitist multiobjective genetic algorithm: nsga-ii . IEEE transactions on evolutionary computation 6 ( 2 ), pp. 182–197 . Cited by: §1 , §2.3 .
- 8[8] W. Deng, Q. Feng, L. Gao, F. Liang, and G. Lin (2020) Non-convex learning via replica exchange stochastic gradient mcmc . In International Conference on Machine Learning , pp. 2474–2483 . Cited by: §2.4 .
