Time-series learning of latent-space dynamics for reduced-order model closure
Romit Maulik, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna, Balaprakash, Daniel Livescu

TL;DR
This paper compares LSTM and NODE neural networks for learning latent-space dynamics in reduced-order models of the viscous Burgers equation, showing their effectiveness in system closure without intrusive methods.
Contribution
It demonstrates that LSTMs and NODEs can effectively learn system closure in reduced-order models, outperforming traditional Galerkin projection methods.
Findings
LSTMs and NODEs accurately reproduce unresolved scale effects.
Time-series learning techniques implicitly leverage memory kernels.
Neural methods outperform intrusive Galerkin projection in test cases.
Abstract
We study the performance of long short-term memory networks (LSTMs) and neural ordinary differential equations (NODEs) in learning latent-space representations of dynamical equations for an advection-dominated problem given by the viscous Burgers equation. Our formulation is devised in a non-intrusive manner with an equation-free evolution of dynamics in a reduced space with the latter being obtained through a proper orthogonal decomposition. In addition, we leverage the sequential nature of learning for both LSTMs and NODEs to demonstrate their capability for closure in systems which are not completely resolved in the reduced space. We assess our hypothesis for two advection-dominated problems given by the viscous Burgers equation. It is observed that both LSTMs and NODEs are able to reproduce the effects of the absent scales for our test cases more effectively than intrusive dynamics…
| Hyperparameter | Type | Starting value | Ending value | Optimal |
|---|---|---|---|---|
| Sequence size | Integer | 5 | 30 | 30 |
| Neurons | Integer | 5 | 100 | 73 |
| Learning rate | Real | 0.0001 | 0.1 | 0.0005 |
| Momentum | Real | 0.99 | 0.999 | 0.9988 |
| Epochs | Integer | 100 | 1000 | 317 |
| Batch Size | Integer | 5 | 30 | 8 |
| Hyperparameter | Type | Starting value | Ending value | Optimal |
|---|---|---|---|---|
| Sequence size | Integer | 5 | 30 | 5 |
| Neurons | Integer | 10 | 100 | 82 |
| Learning rate | Real | 0.0001 | 0.1 | 0.0074 |
| Momentum | Real | 0.99 | 0.999 | 0.9983 |
| Epochs | Integer | 200 | 1200 | 546 |
| Batch Size | Integer | 5 | 30 | 21 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Time-series learning of latent-space dynamics for reduced-order model closure
Romit Maulik
Arvind Mohan
Bethany Lusch
Sandeep Madireddy
Prasanna Balaprakash
Daniel Livescu
Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, IL 60439, USA11footnotemark: 1
Center for Nonlinear Studies/CCS-2 Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA22footnotemark: 2
Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL 60439, USA33footnotemark: 3
Abstract
We study the performance of long short-term memory networks (LSTMs) and neural ordinary differential equations (NODEs) in learning latent-space representations of dynamical equations for an advection-dominated problem given by the viscous Burgers equation. Our formulation is devised in a non-intrusive manner with an equation-free evolution of dynamics in a reduced space with the latter being obtained through a proper orthogonal decomposition. In addition, we leverage the sequential nature of learning for both LSTMs and NODEs to demonstrate their capability for closure in systems which are not completely resolved in the reduced space. We assess our hypothesis for two advection-dominated problems given by the viscous Burgers equation. It is observed that both LSTMs and NODEs are able to reproduce the effects of the absent scales for our test cases more effectively than intrusive dynamics evolution through a Galerkin projection. This result empirically suggests that time-series learning techniques implicitly leverage a memory kernel for coarse-grained system closure as is suggested through the Mori-Zwanzig formalism.
keywords:
ROMs , LSTMs , Neural ODEs , Closures
††journal: Elsevier
1 Introduction
High-fidelity simulations of systems characterized by nonlinear partial differential equations represent immense computational expenditure and are prohibitive for decision-making tasks for applications. To address this issue, there has recently been a significant quantity of research into the reduced-order modeling (ROM) of such systems to reduce the degrees of freedom of the forward problem to manageable magnitudes [1, 2, 3, 4, 5, 6, 7]. As such, this field finds extensive application in control [8], multi-fidelity optimization [9] and uncertainty quantification [10, 11] among others. However, ROMs are limited in how they handle nonlinear dependence and perform poorly for complex physical phenomena which are inherently multiscale in space and time [12, 13, 14, 15]. To address this issue, researchers continue to search for efficient and reliable ROM techniques for transient nonlinear systems.
A common ROM development procedure may be described by the following tasks:
Reduced basis identification. 2. 2.
Nonlinear dynamical system evolution in the reduced basis. 3. 3.
Reconstruction in full-order space for assessments.
The first two items of the aforementioned schema individually constitute areas of extensive investigation, and there are studies which attempt to combine these into one optimization problem as well. In this investigation, we utilize conventional ideas for reduced basis identification with the use of the proper orthogonal decomposition (POD) for finding the optimal global basis. We proceed by considering a parameterized time-dependent partial differential equation given (in the full-order space) by
[TABLE]
where and , are non-linear and linear operators respectively. Our system is characterized by a solution field and appropriately chosen initial as well as boundary conditions. We assume that our system of equations can be solved in space-time on a discrete grid resulting in the following systems of parameterized ODEs
[TABLE]
where is a discrete solution and is the number of spatial degrees of freedom. Specifically, our problem is given by the viscous Burgers’ equation with periodic boundary conditions which can be represented as
[TABLE]
It is well known that the above equations are capable of generating discontinuous solutions even if initial conditions are smooth and is sufficiently small due to advection-dominated behavior. We can then proceed to project our governing equations onto a space of reduced orthonormal bases for inexpensive forward solves of the dynamics.
1.1 Proper orthogonal decomposition
In this section, we review the proper orthogonal decomposition (POD) technique for the construction of a reduced basis [16, 17]. The interested reader may also find an excellent explanation of POD and its relationship with other dimension-reduction techniques in [18]. The POD procedure is tasked with identifying a space
[TABLE]
which approximates snapshots optimally with respect to the norm. The process of generation commences with the collections of snapshots in the snapshot matrix
[TABLE]
where corresponds to an individual snapshot in time (for a total of snapshots) of the discrete solution domain with mean value removed i.e.,
[TABLE]
with being the time averaged solution field. Our POD bases can then be extracted efficiently through the method of snapshots where we solve an eigenvalue problem for a correlation matrix
[TABLE]
where is the diagonal matrix of eigenvalues and is the eigenvector matrix. Our POD basis matrix can then be obtained by
[TABLE]
In practice a reduced basis is built by choosing the first columns of for the purpose of efficient ROMs where . This reduced basis spans a space given by
[TABLE]
The coefficients of this reduced basis (which capture the underlying temporal effects) may be extracted as
[TABLE]
The POD approximation of our solution is then obtained via
[TABLE]
where corresponds to the POD approximation to . The optimal nature of reconstruction may be understood by defining the relative projection error
[TABLE]
which exhibits that with increasing retention of POD bases, increasing reconstruction accuracy may be obtained. As shall be explained later, the coefficient matrix forms our training data for time-series learning.
1.2 Galerkin-projection onto reduced space
The orthogonal nature of the POD basis may be leveraged for a Galerkin projection onto the reduced basis. We start by revisiting Equation (1) written in the form of an evolution equation for fluctuation components i.e.,
[TABLE]
which can expressed in the reduced basis as
[TABLE]
where corresponds to the temporal coefficients at one time instant of the system evolution (i.e., equivalent to a particular column of ). The orthogonal nature of the reduced basis can be leveraged to obtain
[TABLE]
which we denote the POD Galerkin-projection formulation (POD-GP). Note that we have assumed that the residual generated by the truncated representation of the full-order model is orthogonal to the reduced basis. We note that it is precisely this assumption that necessitates closure. From the point of view of the Burgers’ equations given in Equation (5), our POD-GP implementation is given as
[TABLE]
where is one component of and where
[TABLE]
are operators which can be computed offline (i.e, ) and where we have defined an inner-product by
[TABLE]
with and , the operators stemming from the Burgers’ equation. It will be discussed and demonstrated later, that the absence of higher-basis nonlinear interactions causes errors in the forward evolution of this system of equations. Note that the POD-GP essentially consists of coupled ODEs and is solved by a standard total-variation diminishing third-order Runge-Kutta method. The reduced degrees of freedom lead to very efficient forward solves of the problem. Note that this transformed problem has initial conditions given by
[TABLE]
1.3 Contribution
In this article, we investigate strategies to bypass the POD-GP process with an intrusion-free (or equation-free) learning of dynamics in reduced space. We deploy machine learning strategies devised for sequential data learning on time-series samples extracted from the true dynamics of the problems considered. In recent literature, there has been a considerable interest in the utility of machine learning techniques for effective ROM construction. Data-driven techniques have been used in various phases of ROM techniques such as in reduced basis construction [19], augmented dynamics evolution [20, 21, 14, 5, 22, 15, 23, 6, 24, 25] and system identification [26, 27, 28, 29, 30].
In this article, we study the utility of data-driven techniques to make a posteriori predictions for state evolution in reduced space with assessments made on their ability to reconstruct transient characteristics of the full-order solution. It is also observed that the ability to learn a time series (possible through the in-built design of memory and an assumption of non-i.i.d sequential data in the learning) leads to an implicit closure whereby the effects of uncaptured frequencies are retained, drawing parallels to a Mori-Zwanzig formalism. In related work, the study in [31] utilizes recurrent neural networks to explicitly specify a sub-grid stress model for large eddy simulation with the lower frequency evolution controlled by coarse-grained partial differential equations. Our framework is also similar to [32] where an LSTM is utilized to learn a parametric memory kernel for explicit closure of nonlinear PDEs. The present study can be considered a non-intrusive counterpart of these investigations. In addition, we also detail a formalism for efficient machine learning architecture selection using scaleable Bayesian Optimization. Our test problems are given by the advection-dominated viscous Burgers equation [14] with a moving shock as well as a pseudo-turbulence test case denoted ‘Burgulence’ [33, 34] showing the characteristic scaling in wavenumber () space.
2 Latent-space learning
In this section, we outline our machine learning techniques for latent-space time-series learning. We study two techniques built around the premise of preserving memory effects within their architecture - neural ordinary differential equations (NODE) and long short-term memory networks (LSTMs). Both frameworks are tasked with predicting the evolution of over time.
2.1 Neural ordinary differential equations
In recent times, there have been several studies which have interpreted residual neural networks using dynamical systems theory [35, 36, 37, 38]. The framework of the NODE [39] envisions the learning of over time as
[TABLE]
where is a space of user-defined model parameters. The learning can be thought to be through a continuous backpropagation through time, i.e., where there an infinite number of layers in the network with the input layer given by and the output layer given by . Therefore the NODE approximates the latent-space evolution as an ordinary differential equation which is continuous through time in a manner similar to the Galerkin projection. The function in this study is represented by a neural network with a single 40-neuron hidden layer and a tan-sigmoid activation where and is the number of parameters of the neural network architecture. Note that the assumption of a single hidden layer architecture for the right-hand side of the latent-space ODE allows for upper-bound guarantees given by the universal approximation theorem (Barron, 1993) although more complicated dynamics may require deeper architectures. Readers are referred to the work of Chen et al. [39], for a detailed discussion of the neural ODE and its utility in learning sequential data.
The forward propagation of information through time (i.e., from to ) is performed through a standard ODE solver (in this case a first-order accurate Euler method) whereas backpropagation of errors is performed through the backward solve of an adjoint system given by
[TABLE]
where is the augmented state vector given by
[TABLE]
with scalar loss at final time obtained at the following forward-propagation. Each calculation of is followed by the backward solve of Equation (31) (which may be interpreted as continuous backpropagation in time) to calculate which can then be used to determine . This value of the gradient can then be used to update the parameters using an optimization algorithm. In this article, we utilize RMSProp for our loss minimization with a learning rate of 0.01 and a momentum parameter of 0.9. Instead of performing the forward deployment of the NODE and backpropagation of the model errors for the entire domain, we utilize 1000 samples of our total data as our training and validation dataset using the technique detailed in the original article to speed up training. Each sample is given by a sequence of 10 timesteps. The training, for each epoch, is performed using 10 randomly chosen samples (i.e., our batch size is 10) for the calculation of parameter gradients. The final gradient deployed for model training is averaged across this batch. A set of samples (20% of the total 1000), chosen randomly before training, is kept aside from the learning process to assess validation errors. Note that validation errors are also characterized by final timestep loss (i.e., at timestep 10 of each batch) thereby incorporating the degree of error accummulation due to an inaccurately trained model at that epoch. The best model corresponds to the lowest validation loss (averaged across all validation samples). We do not utilize a separate dataset for the purpose of testing.
For the purpose of testing, we note that all assessments for the problems are through forward (or a posteriori) deployment. In other words, the NODE is specified an initial condition and then deployed to obtain state vectors using an ODE forward solve until the final time. The prediction at each time step is obtained by the Euler integration which requires the knowledge of previous state alone. Note that apart from the first prediction by NODE (which utilizes the initial condition), state predictions are recursively utilized for predicting the future. Therefore, testing may be assumed to be a long-term predictive test of the model learning in the presence of deployment error.
2.2 Long short-term memory networks
The long short-term memory (LSTM) network was introduced to consider time-delayed processes where events further back in the past may potentially affect predictions for the current location in the sequence. The basic equations of the LSTM in our context for an input variable are given by
[TABLE]
in which and refer to tangent sigmoid and tangent hyperbolic activation functions respectively, is the number of hidden layer units in the LSTM network. Note that refers to a linear operation given by a matrix multiplication and subsequent bias addition i.e,
[TABLE]
where and for and where refers to a Hadamard product of two vectors. The LSTM implementation will be used to advance as a function of time in the reduced space. The LSTM network’s primary utility is the ability to control information flow through time with the use of the gating mechanisms. A greater value of the forget gate (post sigmoidal activation), allows for a greater preservation of past state information through the sequential inference of the LSTM, whereas a smaller value suppresses the influence of the past. Our LSTM deployment utilized 32 neurons in its input, forget, output and state calculation operations each and utilized a learning rate of 0.001. It uses a sequence to sequence prediction utilized as a rolling window for predicting the output at the next timestep. We utilize a batch size of 16 samples with each sample having a sequence of 10 timesteps for all of our LSTM deployments. As in the previous learning approach, a set of data is kept aside for validation. This validation loss is used to make decisions about model selection. Note that the total number of samples (1000) is the same as the NODE deployment.
2.3 Connection with Mori-Zwanzig formalism
In this section, we outline the Mori-Zwanzig formalism [40, 41] for the viscous Burgers equation and connect it to time-series learning in POD space. We frame the (full-order) dynamics evolution in latent space using the following formulation derived from the first step of the Mori-Zwanzig treatment
[TABLE]
where is the viscous Burgers operator given by
[TABLE]
We proceed by defining two self-adjoint projection operators into orthogonal subspaces given by
[TABLE]
with and . Therefore, may be assumed to be a projection of our full-order representation in POD space ( living in ) onto the reduced basis ( living in ). We can further expand our system as
[TABLE]
which may further be decoupled to a Markov-like projection operator given by
[TABLE]
and a memory operator given by
[TABLE]
for which we have used Dyson’s formula [42] and where corresponds to a hyperparameter that specifies the length of memory retention. The second relationship may be assumed to be a combination of memory effects and noise. The final evolution of the system can then be bundled into a linear combination of these two kernels i.e.,
[TABLE]
The reader may compare this expression with that of the internal state update within an LSTM which we revisit here -
[TABLE]
where a linear combination of a nonlinearly transformed input vector at time with the gated result of a hidden-state at a previous time is utilized for calculating the result vector at the current time. The process of carrying a state through time via gating may be assumed to be a representation of the memory integral (as well as the noise) whereas the utilization of the current input may be assumed to be the Markovian component of the map. In contrast, from the point of view of the NODE implementation, the goal is to learn directly through a neural network.
3 Experiments
In this section, we assess the performance of both NODE and LSTM frameworks in representing latent-space dynamics appropriately. We investigate two problems given by the viscous Burgers’ equation in a periodic domain. Both problems are advection dominated where the first has a moving discontinuity over time (which we shall designate the advecting shock problem) and the second which is characterized by standing shocks of various magnitudes (which we designate Burgulence). Their problem statement and results are shown below.
3.1 Advecting shock
Our first problem is given by the following initial and boundary conditions
[TABLE]
where we specify and maximum time . An analytical solution for the above set of equations exists and is given by
[TABLE]
where and is kept fixed at 1000. We directly utilize this expression to generate our snapshot data for ROM assessments. A visualization of the time evolution of the initial condition is shown in Figure 1. As outlined in a previous assessment of this problem [14], a reduced basis constructed of 20 basis vectors retains 99.93% of the energy of the system. For the purpose of our assessments, we retain solely three modes which results in an unresolved ROM which corresponds to only 86.71 % of the total energy - thus necessitating closure.
We perform an optimization for learning the modal coefficient time series using the LSTM and NODE frameworks. To recap model specifics, we deploy NODE (using 40 neurons) and LSTM (using 32 hidden layer neurons) for learning the sequential nature of the modal coefficient evolution in POD space. We utilize the RMSprop optimizer using a learning rate of 0.01 for the former and 0.001 for the latter and a momentum coefficient of 0.9 for both. Batch sizes of 10 and 16 respectively are also used at each epoch of the learning phase. 1000 randomly chosen sequence lengths of 10 are utilized for learning and validation through time with 20% of the total data kept aside for the latter. We note that the best validation loss (aggregated over all validation samples) is utilized for model selection. Figure 2 shows the progress to convergence for both LSTM and NODE architectures during training for the first three modal coefficients. Both NODE and LSTM trainings are run until validation loss magnitudes hover around a magnitude of 10-4. It is observed that the LSTM framework reaches convergence more quickly although the oscillating losses of the NODE potentially indicate better exploration. We note that the oscillations may also indicate the requirement of a lower learning rate.
The time-series predictions for the trained frameworks are shown in Figure 3 where and correspond to the first three retained modes. For the purpose of comparison, we also show predictions from GP and the true modal coefficients, the latter of which are utilized for training our time-series predictions. We can observe that both LSTM and NODE deployments capture coefficient trends accurately indicating that sequential behavior has been learned. The GP predictions can be seen to show unphysical growth in coefficient amplitude due to the lack of presence of the finer modes. However, LSTM and NODE deployments embed memory into their formulation in the form of a hidden state or through explicit learning of a latent-space ODE respectively. The memory-based nature of their learning leads to excellent agreement with the true behavior of the resolved scales.
The final time reconstructions for the true as well as the GP, LSTM and NODE time-series predictions are shown in Figure 4. One can observe that at this severely truncated state, the discontinuity is not completely resolved. The GP reconstructions show the manifestation of large oscillations (now in the physical domain) whereas NODE and LSTM are able to recover the true solution well. Figures 5 and 6 show a validation of our learning in an ensemble sense, where multiple architectures (with slight differences in the hidden layer neurons) are able to recover similar trends in accuracy as examined through final time reconstruction ability. This reinforces our assumption that an implicit closure is being learned through time-series trends in a statistical manner. The corresponding training losses for the LSTM and NODE architectures are shown in Figures 7 and 8 where it can be seen similar learning trends are obtained with slight variations in the number of trainable parameters.
3.2 Burgers’ turbulence
Our next test case is given by the challenging Burgers’ turbulence or Burgulence test case which leads to multiscale behavior in wavenumber space. Our problem domain is given by a length, , and the initial condition is specified by an initial energy spectrum (in wavenumber space) given by
[TABLE]
where is the wavenumber and is the parameter at which the peak value of the energy spectrum is obtained. The constant is set to the following value
[TABLE]
in order to ensure a total energy of at the initial condition. The initial velocity magnitudes can be expressed in wavenumber space by the following relation with our previously defined spectrum,
[TABLE]
where is a uniform random number generated between 0 and 1 at each wavenumber. Note that this distribution is constrained by to ensure that a real initial condition in physical space is obtained. For the purpose of assessment, we use energy spectra given by
[TABLE]
The aforementioned initial conditions are solved for the viscous Burgers equation in wavenumber space by a Runge-Kutta Crank-Nicolson scheme as described in [43]. Note that our is chosen to be to ensure that sharp discontinuities emerge from the smooth initial condition. Our NODE and LSTM hyperparameters are identical to the previous test case. Our investigations here are performed for the initial condition (and its corresponding time evolution) as shown in Figure 9. It may be observed that there is a considerable multiscale element to the nature of the solution - which makes this a challenging problem for POD-ROM techniques.
Figure 10 shows reduced-space time-series evolutions of the three retained modal coefficients for the frameworks we are comparing. It can be observed that the LSTM and NODE techniques are successful in coefficient evolution stabilization in comparison to GP although the LSTM can be seen to add an element of phase error. The NODE, however, captures latent-space trends exceptionally well. The performance of these time-series learning models is further assessed by their reconstruction in physical space as shown in Figure 11 where it can be seen that the LSTM and NODE perform well in preventing spurious oscillations near discontinuities as exhibited by an unclosed GP evolution. A further validation of this hypothesis is observed in Figure 12 where kinetic energy spectra in wavenumber space show that the high residuals of the GP method are controlled effectively by the LSTM and NODE deployments. The LSTM is seen to result in slightly higher residuals for this particular test case and choice of hyperparameters and optimizer.
3.3 Improving performance through hyperparameter search
While the results obtained in the previous sections indicate an acceptable choice for hyperparameters, we utilize Deephyper [44] to improve the test performance of our frameworks. This is motivated by the comparitively poorer performance of the LSTM in the Burgulence experiment. Deephyper relies on an asynchronous model based search (i.e., a dynamically updated surrogate model which is inexpensive to evaluate) for obtaining hyperparameters with the lowest validation losses. To ensure an expressive surrogate model which is still computationally tractable, we utilize random forests (RF). This results in a superior search algorithm as compared to both a random-search as well as a genetic algorithm based search. We note that RF also gives us the ability to handle discrete and non-ordinal parameters directly without the need for any encoding. Deephyper is configured for searching the hyperparameter space using a standard Bayesian optimization framework. In other words, for each sampled configuration , predicts a mean value for the validation loss and standard deviation . This information is utilized recursively to improve the approximation to the loss surface as predicted by . In terms of exploring the hyperparameter search space, evaluation points with small values of indicate that can potentially result in the reduction of validation error subject to the accuracy of . Evaluation of points with large values of improves since these locations are areas where is least confident about the approximation surface. The choice for the selection of a configuration is utilized by minimizing an acquisition function given by
[TABLE]
where for encouraging exploration.
Deephyper requires a range specification for real variables and a list of possible choices for discrete hyperparameters. Table 1 outlines the range of hyperparameters for the LSTM architecture utilized for the Burgers’ turbulence test case as well as the optimal hyperparameters obtained. A summary of the distribution of sampled hyperparameters and pairwise dependencies is also shown in Figure 13. Note that loss is encoded as negative since the hyperparameter search is based on objective function maximization. Hyperparameter correlations are summarized in Figure 14 where it is observed that most hyperparameters are weakly correlated with each other. However, it must be noted that these results are problem specific. In total, 2151 hyperparameter combinations were evaluated during the process of this search.
For the purpose of comparison, we also show results from a similar hyperparameter search experiment for the NODE but for the advecting shock experiment. The optimal parameters and ranges of this search are shown in Table 2. A summary of the distribution of sampled hyperparameters and pairwise dependencies is also shown in Figure 15. Correlation plots between hyperparameters are shown in Figure 16. In total, 734 hyperparameter combinations were evaluated during the process of this search.
Finally we deploy the optimal hyperparameter configuration for an a posteriori assessment with results as observed in Figure 17. It is observed that an improved performance has been obtained using the LSTM which now matches NODE and true observations. In addition, an analysis of the spectra and residuals in Figure 18 confirms the superior performance as well. Deephyper has successfully led to an improved LSTM architecture.
4 Discussion and conclusions
In this article, we have investigated the utility of using LSTMs and NODEs as non-intrusive learning models for the projections of nonlinear partial differential equations to a latent-space spanned by severely truncated POD modes. We note that the choice of the POD modes (which form a linear subspace) also ensures the applicability of the symmetries of the PDE-governed solution on the machine-learned predictions.
Our ideas are tested on two test cases governed by the viscous Burgers’ equation with the first exhibiting an advecting shock and the second displaying a multiscale nature in full-order space. Both LSTM and NODE formulations are seen to learn the transient nature of our systems in the reduced space since they exploit the sequential nature of data and end up providing an implicit closure. In the second case, we also utilize Deephyper, a scaleable Bayesian optimization package for an improved hyperparameter configuration choice in order to obtain superior performance for the LSTM. The non-i.i.d assumption of the data and associated learning allows for the embedding of a memory effect which provides for accurate coarse-grained evolution of modal coefficients in a manner similar to the Mori-Zwanzig formalism. Our assessments reveal that the machine learning techniques studied here are able to provide stable evolutions of the modal coefficients in comparison to their intrusive and unclosed counterpart (i.e., GP).
We conclude by noting that ROM developments which incorporate history explicitly (such as in the LSTM) or implicitly (such as through a NODE) represent an attractive avenue for exploration for efficient reduced basis dynamics learning of systems which are advection-dominated.
5 Data availability
All the relevant data and codes for this study shall be provided in a public repository at https://github.com/Romit-Maulik/ML_ROM_Closures.
6 Acknowledgements
This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. This research was funded in part and used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This project was also funded by the Los Alamos National Laboratory, 2019 LDRD grant “Machine Learning for Turbulence”. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. DOE or the United States Government.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Carlberg et al. [2011] K. Carlberg, C. Bou-Mosleh, C. Farhat, Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations, Int. J. Numer. Meth. Eng. 86 (2011) 155–181.
- 2Wang et al. [2012] Z. Wang, I. Akhtar, J. Borggaard, T. Iliescu, Proper orthogonal decomposition closure models for turbulent flows: a numerical comparison, Comput. Meth. Appl. M. 237 (2012) 10–26.
- 3San and Borggaard [2015] O. San, J. Borggaard, Principal interval decomposition framework for POD reduced-order modeling of convective Boussinesq flows, Int. J. Numer. Meth. Fl. 78 (2015) 37–62.
- 4Ballarin et al. [2015] F. Ballarin, A. Manzoni, A. Quarteroni, G. Rozza, Supremizer stabilization of POD–Galerkin approximation of parametrized steady incompressible Navier–Stokes equations, Int. J. Numer. Meth. Eng. 102 (2015) 1136–1161.
- 5San and Maulik [2018] O. San, R. Maulik, Extreme learning machine for reduced order modeling of turbulent geophysical flows, Phys. Rev. E 97 (2018) 042322.
- 6Wang et al. [2019] Q. Wang, J. S. Hesthaven, D. Ray, Non-intrusive reduced order modeling of unsteady flows using artificial neural networks with application to a combustion problem, J. Comp. Phys. 384 (2019) 289–307.
- 7Choi and Carlberg [2019] Y. Choi, K. Carlberg, Space–time least-squares petrov–galerkin projection for nonlinear model reduction, SIAM J. Sci. Comput. 41 (2019) A 26–A 58.
- 8Proctor et al. [2016] J. L. Proctor, S. L. Brunton, J. N. Kutz, Dynamic mode decomposition with control, SIAM J. Appl. Dyn. Syst. 15 (2016) 142–161.
