Structure-preserving Method for Reconstructing Unknown Hamiltonian   Systems from Trajectory Data

Kailiang Wu; Tong Qin; Dongbin Xiu

arXiv:1905.10396·math.NA·July 13, 2021

Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

Kailiang Wu, Tong Qin, Dongbin Xiu

PDF

TL;DR

This paper introduces a structure-preserving numerical method for reconstructing unknown Hamiltonian systems from trajectory data, ensuring conservation laws are maintained and effectively handling noisy data.

Contribution

The method directly approximates the Hamiltonian function, preserving the system's structure and conservation properties, which is a novel approach compared to existing techniques.

Findings

01

Successfully reconstructs Hamiltonian systems from data

02

Maintains conservation of the Hamiltonian in reconstructions

03

Effective noise handling with a de-noising procedure

Abstract

We present a numerical approach for approximating unknown Hamiltonian systems using observation data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We present the technical details of the proposed algorithm and its error estimate in a special case, along with a practical de-noising procedure to cope with noisy data. A set of numerical examples are then presented to demonstrate the structure-preserving property and effectiveness of the algorithm.

Tables1

Table 1. Table 1: Example 1: The errors and convergence rates of the Hamiltonian deviation Δ H ~ ( t ) Δ ~ 𝐻 𝑡 \Delta\widetilde{H}(t) computed by the fourth-order Runge-Kutta solver with different time step-sizes τ 𝜏 \tau .

$τ$	$L^{\infty}$ -errors	order	$L^{2}$ -errors	order	total variation	order
$8 \times 10^{- 3}$	2.7493e-7	–	1.0668e-6	–	4.0899e-8	–
$4 \times 10^{- 3}$	2.1146e-8	3.70	8.3070e-8	3.68	1.2806e-9	5.00
$2 \times 10^{- 3}$	1.4455e-9	3.87	5.7047e-9	3.86	4.0064e-11	5.00
$1 \times 10^{- 3}$	9.4345e-11	3.94	3.7331e-10	3.93	1.2632e-12	4.99
$5 \times 10^{- 4}$	5.9859e-12	3.98	2.4324e-11	3.94	1.7065e-13	2.89

Equations179

\frac{d p}{d t} = - \nabla_{q} H (p, q),

\frac{d p}{d t} = - \nabla_{q} H (p, q),

\frac{d q}{d t} = \nabla_{p} H (p, q),

\frac{d u}{d t} = J^{- 1} \nabla_{u} H (u),

\frac{d u}{d t} = J^{- 1} \nabla_{u} H (u),

J = (0_{d} - I_{d} I_{d} 0_{d}),

J = (0_{d} - I_{d} I_{d} 0_{d}),

H (u (t; u_{0})) = H (u_{0}), \forall t \geq 0, \forall u_{0},

H (u (t; u_{0})) = H (u_{0}), \forall t \geq 0, \forall u_{0},

{x_{k}, \dot{x}_{k}}, k = 1, \dots, K,

{x_{k}, \dot{x}_{k}}, k = 1, \dots, K,

x^{(m)} (t_{j}^{(m)})

x^{(m)} (t_{j}^{(m)})

\dot{x}^{(m)} (t_{j}^{(m)})

x^{(m)} (t_{j}) = u (t_{j}; u_{0}^{(m)}) + ϵ_{j}^{(m)}, j = 0, \dots, J,

x^{(m)} (t_{j}) = u (t_{j}; u_{0}^{(m)}) + ϵ_{j}^{(m)}, j = 0, \dots, J,

\dot{x}^{(m)} (t_{j}) = \frac{x ^{(m)} ( t _{j + 1} ) - x ^{(m)} ( t _{j - 1} )}{2Δ t}, 1 \leq j \leq J - 1,

\dot{x}^{(m)} (t_{j}) = \frac{x ^{(m)} ( t _{j + 1} ) - x ^{(m)} ( t _{j - 1} )}{2Δ t}, 1 \leq j \leq J - 1,

P_{m} = P \in [P_{Q}^{1}]^{2 d} argmin j = 0 \sum Q P (t_{j}) - x^{(m)} (t_{j})_{2}^{2}, m = 1, \dots, M,

P_{m} = P \in [P_{Q}^{1}]^{2 d} argmin j = 0 \sum Q P (t_{j}) - x^{(m)} (t_{j})_{2}^{2}, m = 1, \dots, M,

\dot{x}^{(m)} (t_{j}) := \frac{d}{d t} P_{m} (t_{j}) \approx \frac{d}{d t} u (t_{j}; u_{0}^{(m)}) .

\dot{x}^{(m)} (t_{j}) := \frac{d}{d t} P_{m} (t_{j}) \approx \frac{d}{d t} u (t_{j}; u_{0}^{(m)}) .

x^{(m)} (t_{j}) \leftarrow P_{m} (t_{j}) .

x^{(m)} (t_{j}) \leftarrow P_{m} (t_{j}) .

\frac{d u}{d t} = J^{- 1} f (u)

\frac{d u}{d t} = J^{- 1} f (u)

(g,h)_{{\mathbb{H}^{1}_{\omega}}}=\int_{D}\big{(}gh+\nabla g\cdot\nabla h\big{)}{\rm d}\omega(\mathbf{x}),

(g,h)_{{\mathbb{H}^{1}_{\omega}}}=\int_{D}\big{(}gh+\nabla g\cdot\nabla h\big{)}{\rm d}\omega(\mathbf{x}),

V := {\nabla h : h \in W}, N = dim V \geq 1.

V := {\nabla h : h \in W}, N = dim V \geq 1.

ψ_{j} = \nabla ϕ_{j}, 1 \leq j \leq N .

ψ_{j} = \nabla ϕ_{j}, 1 \leq j \leq N .

\nabla H = \nabla h \in V argmin k = 1 \sum K ∥ J \dot{x}_{k} - \nabla h (x_{k}) ∥_{2}^{2},

\nabla H = \nabla h \in V argmin k = 1 \sum K ∥ J \dot{x}_{k} - \nabla h (x_{k}) ∥_{2}^{2},

\nabla H (x) = j = 1 \sum N c_{j} ψ_{j} (x) = j = 1 \sum N c_{j} \nabla ϕ_{j} (x) .

\nabla H (x) = j = 1 \sum N c_{j} ψ_{j} (x) = j = 1 \sum N c_{j} \nabla ϕ_{j} (x) .

H (x) = C + j = 1 \sum N c_{j} ϕ_{j} (x) =: C + H_{0} (x),

H (x) = C + j = 1 \sum N c_{j} ϕ_{j} (x) =: C + H_{0} (x),

c \in R^{N} min ∥ A c - b ∥_{2},

c \in R^{N} min ∥ A c - b ∥_{2},

A = (a_{ij})_{1 \leq i, j \leq N}, b = (b_{1}, \dots, b_{N})^{⊤},

A = (a_{ij})_{1 \leq i, j \leq N}, b = (b_{1}, \dots, b_{N})^{⊤},

\displaystyle a_{ij}=\frac{1}{K}\sum_{k=1}^{K}\Big{(}\nabla\phi_{i}\left(\mathbf{x}_{k}\right)\cdot\nabla\phi_{j}\big{(}\mathbf{x}_{k}\big{)}\Big{)},\quad 1\leq i,j\leq N,

\displaystyle a_{ij}=\frac{1}{K}\sum_{k=1}^{K}\Big{(}\nabla\phi_{i}\left(\mathbf{x}_{k}\right)\cdot\nabla\phi_{j}\big{(}\mathbf{x}_{k}\big{)}\Big{)},\quad 1\leq i,j\leq N,

\displaystyle b_{i}=\frac{1}{K}\sum_{k=1}^{K}\Big{(}\left({\bf J}\,\dot{\mathbf{x}}_{k}\right)\cdot\nabla\phi_{i}\big{(}{\mathbf{x}}_{k}\big{)}\Big{)},\quad 1\leq i\leq N.

H (u (t; u_{0})) = H (u_{0}), \forall t \geq 0, \forall u_{0} .

H (u (t; u_{0})) = H (u_{0}), \forall t \geq 0, \forall u_{0} .

\int_{D} \nabla ϕ_{i} (x) \cdot \nabla ϕ_{j} (x) d ω (x) = δ_{ij} .

\int_{D} \nabla ϕ_{i} (x) \cdot \nabla ϕ_{j} (x) d ω (x) = δ_{ij} .

{\rm Prob}\big{\{}\|{\bf A}-{\bf I}\|>\delta\big{\}}\leq 2N\exp\left(-\frac{\beta_{\delta}K}{{\mathscr{K}}_{N}}\right),

{\rm Prob}\big{\{}\|{\bf A}-{\bf I}\|>\delta\big{\}}\leq 2N\exp\left(-\frac{\beta_{\delta}K}{{\mathscr{K}}_{N}}\right),

K_{N} := x \in D sup j = 1 \sum N ∥ \nabla ϕ_{j} (x) ∥_{2}^{2} .

K_{N} := x \in D sup j = 1 \sum N ∥ \nabla ϕ_{j} (x) ∥_{2}^{2} .

{\rm Prob}\bigg{\{}\|{\bf A}-{\bf I}\|>\frac{1}{2}\bigg{\}}\leq 2K^{-r},

{\rm Prob}\bigg{\{}\|{\bf A}-{\bf I}\|>\frac{1}{2}\bigg{\}}\leq 2K^{-r},

K_{N} \leq λ \frac{K}{lo g K}, \mbox w i t h λ := \frac{3 lo g ( 3/2 ) - 1}{2 + 2 r} .

K_{N} \leq λ \frac{K}{lo g K}, \mbox w i t h λ := \frac{3 lo g ( 3/2 ) - 1}{2 + 2 r} .

\left\|{\bm{\tau}}_{k}\right\|_{2}=\left\|\dot{\bf x}_{k}-{\bf J}^{-1}\nabla H\big{(}{\bf x}_{k}\big{)}\right\|_{2}\leq\tau_{\infty},\quad\forall{\mathbf{x}}_{k}\in D,

\left\|{\bm{\tau}}_{k}\right\|_{2}=\left\|\dot{\bf x}_{k}-{\bf J}^{-1}\nabla H\big{(}{\bf x}_{k}\big{)}\right\|_{2}\leq\tau_{\infty},\quad\forall{\mathbf{x}}_{k}\in D,

∥\nabla H (x) ∥_{2} \leq L < + \infty, x \in D a . e .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

Kailiang Wu

Tong Qin

Dongbin Xiu Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA. [email protected], [email protected], [email protected]. Funding: This work was partially supported by AFOSR FA9550-18-1-0102.

Abstract

We present a numerical approach for approximating unknown Hamiltonian systems using observational data. A distinct feature of the proposed method is that it is structure-preserving, in the sense that it enforces the conservation of the reconstructed Hamiltonian. This is achieved by directly approximating the underlying unknown Hamiltonian, rather than the right-hand-side of the governing equations. We present the technical details of the proposed algorithm and its error estimate in a special case, along with a practical de-noising procedure to cope with noisy data. A set of numerical examples are presented to demonstrate the structure-preserving property and effectiveness of the algorithm.

keywords:

data-driven discovery, Hamiltonian systems, equation approximation, structure-preserving method

1 Introduction

Data-driven discovery of physical laws has received an increasing amount of attention recently. While earlier attempts such as [2, 37] used symbolic regression to select the proper physical laws and determine the underlying dynamical systems, more recent efforts tend to treat the problem as an approximation problem. In this approach, the sought-after governing equation is treated as an unknown target function relating the data of the state variables to their temporal derivatives. Methods along this line of approach usually seek to exactly recover the equations by using certain sparse approximation techniques (e.g., [40]) from a large set of dictionaries; see, for example, [4]. Many studies have been conducted to effectively deal with noise in data [4, 35], corruptions in data [41], limited data [36], partial differential equations [32, 34], etc, and in conjunction with other methods such as model selection approach [22], Koopman theory [3], and Gaussian process regression [28], to name a few. Standard approximations using polynomials without seeking exact recovery can also be effective (cf. [44]). More recently, there is a surge of work that tackles the problem using machine learning methods, particularly via neural networks [29, 30], to systems involving ordinary differential equations (ODEs) [10, 31, 33, 26, 25] and partial differential equations (PDEs) [23, 21, 15, 13, 27, 20, 45]. Neural network structures such as residual network (ResNet) were shown to be highly suitable for this type of problems [26].

Hamiltonian systems are an important class of governing equations in science and engineering. One of the most important properties of Hamiltonian systems is conservation of Hamiltonian, usually a nonlinear function of state variables, along trajectories. Research efforts have been devoted to estimating Hamiltonian function of a given system from measurements; cf. [38, 43, 11, 1, 19]. However, few studies exist to reconstruct an unknown Hamiltonian dynamical systems from trajectory data of state variables.

In this paper, we present a numerical approach to reconstruct an unknown Hamiltonian system from its trajectory data. The focus of this paper is on the conservation of the reconstructed Hamiltonian along the solution trajectories. The current method is an extension of the method proposed in [44], which seeks accurate approximation of unknown governing equations using orthogonal polynomials. However, instead of approximating the governing equations directly, as in [44] and most of other existing studies, our current method seeks to approximate the unknown Hamiltonian first and then reconstruct the approximate governing equations using the approximate Hamiltonian. The approximation of the unknown Hamiltonian is conducted using orthogonal polynomials and with controllable numerical errors. Since in most practical situations, the Hamiltonian takes the form of smooth functions, polynomial approximation can achieve high order accuracy with modest degree polynomials. The resulting approximate governing equations, which are derived from the reconstructed Hamiltonian, can then automatically satisfy the conservation of the reconstructed Hamiltonian, which is an accurate approximation of the true Hamiltonian. This structure preserving (SP) property—the conservation of Hamiltonians along trajectories—is a distinctly new feature of our present method, not found in most of the existing studies. Along with a detailed exposition of the algorithm, we also provide an error estimate of the method in a special case and use a set of numerical examples to demonstrate the properties of the method.

2 Preliminaries

In this section, we introduce some basics about Hamiltonian systems and the setup of our data-driven discovery of Hamiltonian systems.

2.1 Hamiltonian Systems

Let us consider a Hamiltonian system

[TABLE]

where $\bm{p}$ and $\bm{q}$ are column vectors in $\mathbb{R}^{d}$ , and $H(\bm{p},\bm{q})$ is a continuously differentiable scalar function called Hamiltonian. The Hamiltonian often represents the total energy of the system. It is not unique and is defined up to an arbitrary constant.

Let ${\bf u}:=(\bm{p}^{\top},\bm{q}^{\top})^{\top}$ be the state variable vector. The Hamiltonian system (1) can be equivalently written as

[TABLE]

where $\nabla_{\bf u}$ stands for full gradient and the matrix $\bf J$ takes the form

[TABLE]

with ${\bf I}_{d}$ and ${\bf 0}_{d}$ being identity matrix and zero matrix of size $d\times d$ , respectively. Hereafter, we will use $\nabla$ in place of $\nabla_{\bf u}$ , unless confusion arises otherwise.

The Hamiltonian system (2) is an autonomous system and conserves the Hamiltonian along the integral curves (c.f., [24]). That is, the solution of the Hamiltonian system (2) satisfies

[TABLE]

where ${\bf u}_{0}$ is the initial state of the system at $t=0$ , and ${\bf u}(t;{\bf u}_{0})$ stands for the solution ${\bf u}$ at time $t$ with an initial state ${\bf u}_{0}$ .

2.2 Data and Problem Setup

We assume that the equations of the Hamiltonian system (1), or (2), are not known. Our data-driven approach for approximating the unknown equations requires the availability of a set of data pairings between the solution states and their corresponding time derivatives, in the form of

[TABLE]

where $\dot{\mathbf{x}}$ denotes the time derivative of $\mathbf{x}$ , and $K$ is the total number of data pairs.

2.2.1 Direct Data Collection

Let $D\in{\mathbb{R}}^{2d}$ be a bounded domain. It is the domain-of-interest, inside which we seek to construct an accurate approximation to the unknown governing equations (2). Our data set (4) shall be collected in the domain $D$ .

When time derivatives of the state variables are readily available, either directly measured from experiments or computed via certain numerical simulation techniques, the data collection procedure is straightforward. Let $M\geq 1$ be the number of solution trajectories, originated from ${\bf u}_{0}^{(1)},\dots,{\bf u}_{0}^{(M)}$ initial states. Let $0=t_{0}^{(m)}<t_{1}^{(m)}<\cdots<t_{J_{m}}^{(m)}$ be a sequence of time instances on the $m$ -th trajectory, for $m=1,\dots,M$ . We assume that the state variables data and their derivative data are available on these time instances, i.e., for $0\leq j\leq J_{m}$ and $1\leq m\leq M$ ,

[TABLE]

where ${\bm{\epsilon}}_{j}^{(m)}$ and ${\bm{\tau}}_{j}^{(m)}$ are errors/noises in the data for the state variables and their time derivatives, respectively.

Once all the data pairs are collected, they are grouped in the set (4), where we omit the subscripts and superscripts for notational convenience. This is because our method for approximating the governing equation using the data does not utilize the trajectory or time instance information associated with each data pair.

2.2.2 Time Derivatives Approximation and

De-noising

In many practical situations, time derivative data are not available. Consequently, one only possesses trajectory data for the state variables. In this case, it is necessary to estimate the time derivatives of the state variables via a numerical procedure.

Again, let $M$ be the number of trajectories where only the state variable data are available. For notational convenience we let $0=t_{0}<t_{1}<\cdots<t_{J}$ be a sequence of the same time instances on all trajectories, where the state variable data

[TABLE]

are available. Again, ${\bm{\epsilon}}_{j}^{(m)}$ stands for errors/noises in the data. To numerically estimate the time derivatives, it is necessary to require $J\geq 1$ , i.e., there need to be at least two data entries of the state variables along each trajectory in order to estimate the time derivatives.

For noiseless data, i.e. ${\bm{\epsilon}}_{j}^{(m)}=\mathbf{0}$ , time derivatives can be computed by straightforward numerical differentiation. For example, for equally spaced time instances with uniform step-size $\Delta t$ , a second-order finite difference

[TABLE]

with proper one-sided second-order finite difference at the end points $j=0$ and $j=J$ . This requires at least three data entries on each trajectory, i.e., $J\geq 2$ , and induces errors of $O(\Delta t^{2})$ . Higher order approximations requires more data points on each trajectory.

For noisy state variable data with ${\bm{\epsilon}}_{j}^{(m)}\neq\mathbf{0}$ , direct numerical differentiation is less robust, as the errors in estimating ${\bf\dot{x}}^{(m)}(t_{j})$ would scale as $\sim{\bm{\epsilon}}_{j}^{(m)}/\Delta t$ . Several techniques have been developed for numerical differentiation of noisy data. See, for example, [16, 42, 9, 5, 17]. In this paper, we employ a straightforward de-noising approach, which has been shown to be effective for equation recovery ([44]). We first construct a least squares polynomial approximation of the trajectory using the available data on $\bf{x}$ , and then analytically differentiate the least squares fitted polynomial to obtain an estimate of the time derivatives. More specifically, for each trajectory, the least squares polynomial approximation is to find a polynomial vector ${\bm{\mathcal{P}}}_{m}\in[\mathbb{P}^{1}_{Q}]^{2d}$ such that

[TABLE]

where $\|\cdot\|_{2}$ denotes vector 2-norm and $\mathbb{P}_{Q}^{1}$ denotes the space of one-dimensional polynomials of degree at most $Q$ , with $1\leq Q\leq J$ . Once the least squares fitting problem is solved, the time derivatives can be approximated by differentiating the polynomials, i.e.,

[TABLE]

This approach also provides a filter to de-noise the noisy trajectory data. For noisy data, we advocate the use of the filtered trajectory data to replace the original noisy data, i.e.,

[TABLE]

Our numerical experiments indicate that this filtering procedure can improve the learning accuracy for noisy data. This is similar to the results from [44].

Remark 2.1.

If the true trajectories ${\bf u}(t;{\bf u}_{0}^{(m)})$ are non-smooth, estimating time derivatives using global approximation (9) might not be sufficiently accurate. In this case, piecewise approximation should be considered.

3 The Main Method

With the data pairs (4), our goal is now to accurately approximate the unknown Hamiltonian system (2). Let ${\bf f}:=\nabla H$ , which is the unknown right-hand-side of (2). We seek an accurate approximation $\widetilde{\bf f}\approx{\bf f}$ such that

[TABLE]

is an accurate approximation of the true system (2). Our key goal is to ensure the approximate system is also Hamiltonian, in the sense that $\widetilde{\bf f}=\nabla\widetilde{H}$ , where $\widetilde{H}$ becomes an approximation to the true (and unknown) Hamiltonian. The existing methods for equation recovery seek to approximate the right-hand-side of the true system directly and therefore do not enforce the conservation of Hamiltonian.

3.1 Algorithm

To preserve the Hamiltonian, we propose to directly approximate the unknown Hamiltonian first and then derive the approximate governing equations from the approximate Hamiltonian.

Let us assume the unknown Hamiltonian $H\in{\mathbb{H}^{1}_{\omega}}(D)$ , which is a weighted Sobolev space on domain $D\subset\mathbb{R}^{2d}$ equipped with inner product

[TABLE]

where $\omega(\mathbf{x})$ is a (probability) measure defined on $D$ .

Let $\mathbb{W}\subset{\mathbb{H}^{1}_{\omega}}(D)$ be a finite dimensional subspace. We then define its associated gradient function space as

[TABLE]

Let $\{{\bm{\psi}}_{j}({\bf x})\}_{j=1}^{N}$ be a basis for $\mathbb{V}$ . Then, for each $j=1,\dots,N$ , there exists a function $\phi_{j}\in\mathbb{W}$ such that

[TABLE]

We then seek $\widetilde{H}\in\mathbb{W}$ as an approximation to the true Hamiltonian $H$ and $\widetilde{\bf f}=\nabla\widetilde{H}\in\mathbb{V}$ as an approximation to $\nabla H$ . Assume that $N<K$ , i.e., the dimension of the linear subspace is smaller than the total number of available data pairs (4), we then define the following least squares problem

[TABLE]

where $\|\cdot\|_{2}$ denotes vector 2-norm.

With the basis (14), $\nabla\widetilde{H}$ can be expressed as

[TABLE]

This provides a class of approximate Hamiltonians which differ only in an additive constant $C$ :

[TABLE]

where the constant $C$ can be arbitrarily chosen and does not affect the resulting approximate Hamiltonian system (12). In particular, when taking $C=0$ , we use the notation $\widetilde{H}_{0}$ for $\widetilde{H}$ . The problem (15) is then equivalent to the following problem for the unknown coefficients ${\bf c}=(c_{1},\dots,c_{N})^{\top},$

[TABLE]

where

[TABLE]

with

[TABLE]

This is an over-determined system of equations and can be readily solved. Upon solving this least squares type problem, we obtain $\widetilde{H}$ and subsequently $\widetilde{\bf f}=\nabla\widetilde{H}$ , which gives us the approximate system of equations (12). It is trivial to see that the system preserves the approximate Hamiltonian $\widetilde{H}$ in the following sense.

Theorem 1.

Let $\widetilde{\bf u}(t;{\bf u}_{0})$ be the solution of the system (12) with initial state ${\bf u}_{0}$ , then,

[TABLE]

3.2 Analysis

We now present error analysis for the proposed algorithm in a special case. Our analysis is based on a few basic results from [7] for least squares polynomial approximations, which requires the following assumptions on the basis functions and the data.

3.2.1 Assumptions

The basis functions $\{\phi_{j}\}_{j=1}^{N}$ are assumed to be orthonormal in the following sense

[TABLE]

Note that this assumption is only needed for the theoretical analysis. The practical computation of $\nabla\widetilde{H}$ can be conducted by using any basis of $\mathbb{V}$ , for the solution $\nabla\widetilde{H}$ does not depend on the basis. (Also, any non-orthogonal basis can be orthogonalized via Gram-Schmidt procedure.) We remark that the choice of basis affects the stability of the least squares problem (18). The actual computation of $\nabla\widetilde{H}$ can be made using any known basis of $\mathbb{W}$ , since the solution to the problem (15) is independent of the chosen basis. Thus, the error estimates in Section 3.2.3 also hold for any other bases of $\mathbb{W}$ .

We assume that data for the state variable $\mathbf{x}_{k}$ , $k=1,2,\dots,K$ , are i.i.d. drawn from a probability measure $\omega(\mathbf{x})$ on $D$ . This is a standard assumption, made mostly to facilitate theoretical analysis. See, for example, [7].

3.2.2 Stability

The following stability result holds for the least squares problem (18).

Lemma 2.

Consider the problem (18), it holds that, for $0<\delta<1$ ,

[TABLE]

where $\beta_{\delta}:=(1+\delta)\log(1+\delta)-\delta>0$ , and

[TABLE]

Proof.

The proof is a direct extension of the proof of Theorem 1 in [7] (see also [8] for a correction). ∎

Remark 3.1.

The function $\sum_{j=1}^{N}\left\|\nabla\phi_{j}(\mathbf{x})\right\|_{2}^{2}$ is the “diagonal” of the reproducing kernel of $\mathbb{V}$ . It is independent of the choice of the orthonormal basis and only depends on the space $\mathbb{V}$ and the measure $\omega$ .

The following result is a direct consequence of Lemma 2 with $\delta=\frac{1}{2}$ .

Corollary 3.

The least squares problem (18) is stable in the following sense: for any $r>0$ ,

[TABLE]

provided that

[TABLE]

3.2.3 Error bound

To analyze errors in the proposed algorithm, we consider only noiseless data case. For noisy data, the analysis is considerably more involved and will be pursued in a separate study.

For noiseless data, we consider the more practical case when only state variable data are available and time derivative data are computed numerically. Since the state variable data are noiseless, the only errors in the data set (4) are the numerical approximation errors for the time derivatives, as discussed Section 2.2.2. We assume the approximation errors ${\bm{\tau}}_{k}:=\dot{\bf x}_{k}-{\bf J}^{-1}\nabla H\big{(}{\bf x}_{k}\big{)}$ are uniformly bounded, i.e.,

[TABLE]

where $\tau_{\infty}<+\infty$ is an assumed bound that depends on the regularity of ${\bf u}(t)$ in the time interval $[0,J\Delta t]$ and the accuracy of the numerical differentiation method.

Theorem 4.

Assume

[TABLE]

For any $r>0$ , under the condition (24), it holds that

[TABLE]

where the expectation $\mathbb{E}$ is taken over the random sequences of $\{{\bf x}_{k}\}_{k=1}^{K}$ , $\lambda$ is defined in (24), $L$ is the bound defined in (26), ${\bf T}_{L}(\mathbf{x})$ is defined by

[TABLE]

and $\Pi_{\mathbb{V}}(\nabla H)$ denotes the orthogonal projector of $\nabla H$ onto $\mathbb{V}$ , i.e., the best approximation to $\nabla H$ in $\mathbb{V}$ ,

[TABLE]

Proof.

See Appendix A. ∎

As a direct consequence, we have the following corollary.

Corollary 5.

Assume $\widetilde{L}:=\max\{\|\nabla H\|_{2,L^{\infty}},\|\nabla\widetilde{H}\|_{2,L^{\infty}}\}<+\infty.$ Then, for any $r>0$ , under the condition (24), the following result holds,

[TABLE]

We now discuss error bound for the reconstructed Hamiltonian

[TABLE]

Note that Hamiltonian is not unique and is defined up to an additive constant $C$ . Therefore, the error between $\widetilde{H}(\mathbf{x})$ and $H(\mathbf{x})$ should be understood in the quotient space ${\mathbb{H}^{1}_{\omega}}(D)/\mathbb{R}$ .

Theorem 6.

Assume $D$ is a bounded connected open subset of $\mathbb{R}^{2d}$ with Lipschitz boundary and let $d\omega=\frac{1}{\int_{D}d\mathbf{x}}d\mathbf{x}$ . Then, there exists a real constant $C$ such that

[TABLE]

where $C_{D,d}$ is a constant depending only on the domain $D$ and the dimensionality $d$ . Furthermore, under the assumptions of Corollary 5, we have

[TABLE]

Proof.

Let us take

[TABLE]

Using the Poincaré inequality (cf. [18]) we obtain (28). The proof is then completed upon combining (28) with Corollary 5. ∎

4 Numerical Examples

In this section we present numerical examples to demonstrate the properties and effectiveness of the proposed method.

In all the test cases, we generate synthetic trajectory data by solving the underlying Hamiltonian systems using a high resolution numerical solver. More specifically, we use the classical fourth-order explicit Runge-Kutta method (cf. [12, p. 131]) with a very small time step of size $0.0001\Delta t$ . The proposed numerical method is then applied to the data to produce the corresponding approximate Hamiltonian systems, whose solutions are then compared against the solutions to the true Hamiltonian systems to examine numerical errors. Note that in all of the tests the only available data are on the solution state variables. The time derivatives of the states are estimated numerically using the procedure discussion in Section 2.2.2.

For convenience, we assume the computational domain $D$ to be a hypercube. Without loss of generality, we employ polynomial basis functions in all the numerical examples. Specifically, we set the finite dimensional subspace $\mathbb{W}$ as ${\mathbb{P}}_{n}^{2d}$ , the linear space of $2d$ -dimensional polynomials of total degrees up to $n\geq 1$ . That is,

[TABLE]

where ${\bf i}=(i_{1},\dots,i_{2d})$ is multi-index with $|{\bf i}|=i_{1}+\dots+i_{2d}$ . In all examples, we use the tensor products of univariate Legendre polynomials as a basis on the hypercube domain $D$ , which are commonly used in many practical applications. See, for example, [39]. Although the Legendre polynomials do not satisfy the orthogonality defined at the beginning of Section 3.2, our numerical results indicate that they are a good choice. Note that the solution to the least squares problem (15) is independent of basis choice.

The gradient function space $\mathbb{V}$ is defined via (13), and we have

[TABLE]

The basis functions of $\mathbb{V}$ are set as $\bm{\psi}_{j}(\mathbf{x})=\nabla\phi_{j}(\mathbf{x})$ , $j=1,\dots,N$ , where $\phi_{j}$ are the Legendre polynomials in $\mathbb{W}$ .

For noiseless data, we employ second-order finite difference method to estimate the time derivatives. For noisy data, we use the polynomial least squares de-noising (9) with a polynomial degree of $Q=5$ . The detail of the time derivative estimation is discussed in See Section 2.2.2.

Once the approximate system (12) is constructed, we simulate its trajectories $\widetilde{\bf u}$ for some arbitrarily chosen initial state ${\bf u}_{0}^{*}$ , which is not in the training data (4), and then compare the errors against the trajectories ${\bf u}$ produced by the exact Hamiltonian system from the same initial state ${\bf u}_{0}^{*}$ . All errors are reported as relatively errors in the following form for any $t\geq 0$ :

[TABLE]

Example 1: Single pendulum

The Hamiltonian of an ideal single pendulum with unit mass is its total energy

[TABLE]

where $l$ is the length of the pendulum, $q$ is the angular displacement of the pendulum from its downward equilibrium position, $p$ the angular momentum, and $g=9.8$ the gravitational constant. The true Hamiltonian formulation of the dynamics is

[TABLE]

We set $l=1$ and the computational domain $D=(-2\pi,2\pi)\times(-\pi,\pi)$ . The data pairs (4) consist of $M=500$ short trajectories, each of which is generated by random initial state in $D$ and contains $J=40$ steps. Hereafter, the random initial states are independently drawn from the uniform distribution over $D$ . All data are then perturbed by a multiplicative factor $(1+\eta)$ , where $\eta$ is i.i.d. uniform distributed in $[-0.08,0.08]$ . This corresponds to $\pm 8\%$ relative noise in all data.

The Hamiltonian $\widetilde{H}(\cdot)$ is approximated with polynomials of degree up to $n=6$ . The numerical solution of the approximate Hamiltonian system is denoted as $\widetilde{\bf u}(t;{\bf u}_{0})$ . To assess the accuracy of the algorithm, we set an arbitrarily chosen initial state $\mathbf{u}_{0}^{*}=(-3.876,-1.193)^{\top}$ and solve both the approximate solution $\widetilde{\bf u}(t;{\bf u}_{0}^{*})$ and the exact solution ${\bf u}(t;{\bf u}_{0}^{*})$ . For cross-comparison, we also implemented the equation approximation algorithm from [44], which directly approximates the right-hand-side of the unknown governing equations and thus, in general, does not preserve any Hamiltonian. We denote this solution $\widehat{\bf u}(t;{\bf u}_{0}^{*})$ .

In Fig. 1(a), we plot the evolution of the relative errors in the numerical solutions. We clearly observe that the errors in our structure-preserving (SP) algorithm is notably smaller than the non-SP algorithm from [44]. In Fig. 1(b), we examine the time evolution of the Hamiltonians. The exact Hamiltonian is obviously conserved along the exact solution trajectory, i.e., $H({\bf u}(t;{\bf u}_{0}^{*}))=H({\bf u}_{0}^{*})$ . As expected from Theorem 1, the approximate Hamiltonian $\widetilde{H}$ of the recovered system (12) is also exactly preserved along its trajectory. The only (small) errors in the computed Hamiltonian may (merely) arise from the ODE solver, which we employ to numerically solve the reconstructed system; see Figure 2 for the Hamiltonian deviation $\Delta\widetilde{H}(t):=\widetilde{H}(\widetilde{\bf u}(t;{\bf u_{0}}))-\widetilde{H}(\widetilde{\bf u}_{0})$ computed by the classical fourth-order explicit Runge-Kutta solver with different time step-sizes $\tau$ . We clearly observe that the errors in the Hamiltonian deviation decrease quickly as we reduce $\tau$ , and the errors are close to the level of round-off error when $\tau=2.5\times 10^{-4}$ . We also list the $L^{\infty}$ , $L^{2}$ , and total variation norms of the computed $\Delta\widetilde{H}(t)$ in Table 1, which shows that the errors in the Hamiltonian deviation converge to zero at a order related to the employed ODE solver. These results further confirm that the recovered system (12) does exactly preserve the approximate Hamiltonian $\widetilde{H}$ . Note that the non-SP method from [44], albeit quite accurate, generally does not preserve or relate to any Hamiltonian. For the present test case, we have examined that the system recovered by the non-SP method, denoted by

[TABLE]

is not a Hamiltonian system, because it does not satisfy $\nabla_{p}g+\nabla_{q}h=0$ , so that there is no Hamiltonian $\widehat{H}(p,q)$ satisfying $-\nabla_{q}\widehat{H}=g$ and $\nabla_{p}\widehat{H}=h$ . Note that the data are noisy, and the approximate functions $g(p,q)$ and $h(p,q)$ are not univariate, not as the functions in the true system (29).

The advantage of the proposed SP algorithm is more notable in Fig. 3, we present system predictions over longer time. The SP method is able to accurately capture the phase of the solution much better than the non-SP method in [44].

Note that the de-noising procedure (11) has been applied in the computation. For comparison, we also apply the proposed SP learning method without using the de-noising procedure (11). The results are plotted in Fig. 4. Direct comparison of the numerical errors obtained by the two approaches is shown Fig. 5. It is evident that the results obtained without the de-noising procedure are less accurate than those by using de-noising.

Example 2

We now consider the following Hamiltonian system

[TABLE]

whose Hamiltonian is

[TABLE]

We set the parameters $\alpha_{1}=1$ and $\alpha_{2}=1.1$ and the computational domain $D$ as $[-1,1]^{2}$ . We use $M=300$ noiseless short trajectory data, each of which contains $J=2$ intervals (i.e. 3 data points). The degree of the polynomials for approximation of the Hamiltonian is $n=6$ . The reconstructed system is solved with an initial state ${\bf u}_{0}^{*}=(0.6,0.6)^{\top}$ and compared against the solution of the true system. The relative numerical errors in the solutions of the SP algorithm (denoted as $\widetilde{\bf u}$ ) and non-SP algorithm from [44] (denoted as $\widehat{\bf u}$ ) are plotted in Fig. 6, along with the time evolution of the reconstructed Hamiltonian. The higher accuracy of the SP algorithm is again evident from the plot, as it induces smaller errors over long-term integration and preserves the approximate Hamiltonian $\widetilde{H}$ along its trajectory. In Fig. 7 and Fig. 8, we present the trajectories and the phase plots generated by the reconstructed system. The advantage of the new SP algorithm is again notable, as it is able to preserves both the phase and amplitude of the solution much better over long-term integration.

Example 3: Hénon-Heiles problem

We now consider the Hénon-Heiles system [14],

[TABLE]

where the Hamiltonian is

[TABLE]

This system is used to describe the motion of stars around a galactic center. Chaotic behavior of the solution will appear when the Hamiltonian is larger than $1/8$ ([14]). For our numerical tests, we choose the computational domain $D$ to be $[-1,1]^{4}$ and employ $M=500$ trajectories, each of which contain $J=2$ intervals. Polynomials of degree up to $n=3$ are used to approximate the Hamiltonian. The reconstructed system is solved with an initial state ${\bf u}_{0}^{*}=(0.3,-0.25,0.2,-0.25)^{\top}$ and compared against the true solution. The time evolution of the reconstructed Hamiltonian and the numerical error in the solution are plotted in Fig. 9. We observe sufficiently small and stable numerical errors and good conservation of the Hamiltonian over relatively long-term integration.

In Fig. 10 and Fig. 11, the trajectory plots and phase plots for the reconstructed system using the new SP algorithm are presented, along with those from the true system as reference. The solutions exhibit non-trivial behavior. And the reconstructed system is able to accurately produce the solutions.

Example 4: Cherry problem

We now consider the Cherry Hamiltonian system [6],

[TABLE]

whose true Hamiltonian is

[TABLE]

We take the computational domain as $D=(-2,2)\times(-1,2)\times(-2,1)\times(-1,1)$ and use $M=500$ short trajectory data, each of which contain $J=2$ intervals. Polynomials of degree up to $n=3$ are employed to approximate the Hamiltonian. The reconstructed system is then solved using an arbitrarily chosen initial state ${\bf u}_{0}^{*}=(-0.05,0.1,0.15,0.1)^{\top}$ . The solutions are then compared against those from the true system with the same initial state. The relative errors in the numerical prediction are plotted in Fig. 12, along with the time evolution of the reconstructed Hamiltonian $\widetilde{H}$ and the true Hamiltonian $H$ . We again observe good accuracy by the SP algorithm and conservation of the approximate Hamiltonian. The solution states are plotted in Figs. 13, and their phase plots in 14. The numerical solutions agree with the true solutions well.

Example 5: Double pendulum

Finally, we consider a double pendulum problem, as illustrated in Fig. 15.

Two masses $m_{1}$ and $m_{2}$ are connected via massless rigid rods of length $l_{1}$ and $l_{2}$ , and $\theta_{1}$ and $\theta_{2}$ are the angles of the two rods with respect to the vertical direction. We define the canonical momenta of the system as

[TABLE]

By letting $q_{1}=\theta_{1}$ and $q_{2}=\theta_{2}$ , the Hamiltonian of the system is

[TABLE]

where $g$ is the gravitational constant. The governing equations of the system are

[TABLE]

where

[TABLE]

In the numerical experiment, we set $m_{1}=m_{2}=l_{1}=l_{2}=1$ and $g=9.8$ , and set the computational domain as $D=(-5,5)\times(-4,4)\times(-1,1)\times(-1,1)$ . The Hamiltonian in this example is notably more complicated than the ones in the previous examples. Consequently, we employ a higher order polynomial, of degree up to $n=15$ , to conduct the approximation. The data set include $M=20,000$ short trajectories, each of which contains $J=2$ intervals. The reconstructed Hamiltonian system is then solved with an arbitrarily chosen initial state ${\bf u}_{0}^{*}=(0,0,\frac{\pi}{6},\frac{\pi}{4})^{\top}$ for up to $T=20$ . Its solution is compared against the reference solutions from the true system with the same initial state. Fig. 16 shows the evolution of the reconstructed Hamiltonian and the true one, which remain constant as expected. The time evolution of the solution states are plotted in Fig. 17. We observe good agreement with the true solution. The corresponding phase plots are further displayed in Fig. 18, where we also plot the results obtained by using higher order polynomial approximations ( $n=16$ with $M=20,000$ and $n=18$ with $M=60,000$ ) to show the convergence. We see that the evolution of trajectories is more accurately predicted by the reconstructed Hamiltonian system obtained by using higher degree polynomials. The relative numerical errors, shown in Fig. 19, further validate the convergence behavior.

5 Conclusion

We presented a structure-preserving numerical method for reconstructing unknown Hamiltonian systems using observation data. The key ingredient of the method is to approximate the unknown Hamiltonian first and then derive the approximate equations using the reconstructed Hamiltonian. By doing so, the reconstructed system is able to preserve the approximate Hamiltonian along its trajectories. This is an important property often desired by many practical applications. We presented the algorithm, its error estimate in a special case and used a variety of examples to demonstrate the effectiveness of the approach. In its current form, polynomials are used to construct the approximation. Other forms of approximation, such as neural networks, will be explored in a separate work.

Appendix A Proof of Theorem 4

The technique used in the following proof is similar to the proof of Theorem 3 of [7]. However, our Theorem 4 applies to vector-valued function $\nabla H$ in gradient function space. This prevents direct use, in component-by-component manner, of the result from [7], which applies only to scalar-valued function. Also, our analysis incorporates numerical errors induced by estimating time derivatives $\dot{\bf x}_{k}$ . These numerical errors often do not follow random distribution. Consequently, we do not employ i.i.d. assumption on the errors, as opposed to the work of [7]. Due to these subtle, and yet significant, differences, we include the proof of Theorem 4 here for completeness of the paper.

Proof.

Let $d\omega^{K}=\otimes^{K}d\omega$ be the probability measure of the random sequence $\{\mathbf{x}_{k}\}_{k=1}^{K}$ . Let $\Omega$ be the set of all possible draws, which is divided into the set $\Omega_{+}$ of all draws such that

[TABLE]

and the complement set $\Omega_{-}:=\Omega\setminus\Omega_{+}$ . We consider the following splitting

[TABLE]

We now estimate the upper bounds for $I_{1}$ and $I_{2}$ .

Let us first consider $I_{2}$ . Based on Corollary 3 and under the condition (24), we have

[TABLE]

Note that

[TABLE]

Therefore, we have

[TABLE]

We now consider $I_{1}$ . For every $\mathbf{x}\in D$ , if $\|\nabla\widetilde{H}(\mathbf{x})\|_{2}\leq L$ , then

[TABLE]

so that

[TABLE]

For almost every $\mathbf{x}\in D$ with respect to $\omega(\mathbf{x})$ , if $\|\nabla\widetilde{H}(\mathbf{x})\|_{2}>L$ , then

[TABLE]

which implies

[TABLE]

Therefore, we have

[TABLE]

for almost every $\mathbf{x}\in D$ with respect to $\omega(\mathbf{x})$ . It follows that

[TABLE]

Let us rewrite the derivative data as

[TABLE]

where ${\bm{\tau}}_{k}$ denotes the error in the estimated derivative. Define $\nabla G:=\nabla H-\Pi_{\mathbb{V}}(\nabla H)$ . Similar to [7], one can write

[TABLE]

where

[TABLE]

and

[TABLE]

Then, we have

[TABLE]

where $\bm{\xi}=(\xi_{1},\dots,\xi_{N})^{\top}$ and $\bm{\eta}=(\eta_{1},\cdots,\eta_{N})^{\top}$ are respectively the solutions of the two systems

[TABLE]

with the matrix $\bf A$ defined in (19), ${\bf y}=(y_{1},\dots,y_{N})^{\top}$ , ${\bf z}=(z_{1},\dots,z_{N})^{\top}$ , and

[TABLE]

When the draw $\{\mathbf{x}_{k}\}_{k=1}^{K}$ belong to $\Omega_{+}$ , we have (34), which yields $\|{\bf A}^{-1}\|\leq 2$ and

[TABLE]

Hence

[TABLE]

For each $1\leq j\leq N$ , we estimate $\mathbb{E}\big{(}y_{j}^{2}\big{)}$ as follows:

[TABLE]

where the Cauchy–Schwarz inequality has been used in the inequality. It follows that

[TABLE]

We now estimate $\mathbb{E}\big{(}z_{j}^{2}\big{)}$ for each $1\leq j\leq N$ .

[TABLE]

It follows from (39) that

[TABLE]

which together with (37) complete the proof. ∎

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Bairey, I. Arad, and N. H. Lindner , Learning a local hamiltonian from local measurements , Phys. Rev. Lett., 122 (2019), p. 020504.
2[2] J. Bongard and H. Lipson , Automated reverse engineering of nonlinear dynamical systems , Proc. Natl. Acad. Sci. U.S.A., 104 (2007), pp. 9943–9948.
3[3] S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz , Chaos as an intermittently forced linear system , Nature Communications, 8 (2017).
4[4] S. L. Brunton, J. L. Proctor, and J. N. Kutz , Discovering governing equations from data by sparse identification of nonlinear dynamical systems , Proc. Natl. Acad. Sci. U.S.A., 113 (2016), pp. 3932–3937.
5[5] R. Chartrand , Numerical differentiation of noisy, nonsmooth data , ISRN Applied Mathematics, 2011 (2011).
6[6] T. M. Cherry , V. on periodic solutions of hamiltonian systems differential equations , Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 227 (1928), pp. 137–221.
7[7] A. Cohen, M. A. Davenport, and D. Leviatan , On the stability and accuracy of least squares approximations , Foundations of Computational Mathematics, 13 (2013), pp. 819–834.
8[8] A. Cohen, M. A. Davenport, and D. Leviatan , Correction to: On the stability and accuracy of least squares approximations , Foundations of Computational Mathematics, 19 (2019), pp. 239–239.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Structure-preserving Method for Reconstructing Unknown Hamiltonian Systems from Trajectory Data

Abstract

keywords:

1 Introduction

2 Preliminaries

2.1 Hamiltonian Systems

2.2 Data and Problem Setup

2.2.1 Direct Data Collection

2.2.2 Time Derivatives Approximation and

Remark 2.1**.**

3 The Main Method

3.1 Algorithm

Theorem 1**.**

3.2 Analysis

3.2.1 Assumptions

3.2.2 Stability

Lemma 2**.**

Proof.

Remark 3.1**.**

Corollary 3**.**

3.2.3 Error bound

Theorem 4**.**

Proof.

Corollary 5**.**

Theorem 6**.**

Proof.

4 Numerical Examples

Example 1: Single pendulum

Example 2

Example 3: Hénon-Heiles problem

Example 4: Cherry problem

Example 5: Double pendulum

5 Conclusion

Appendix A Proof of Theorem 4

Proof.

Remark 2.1.

Theorem 1.

Lemma 2.

Remark 3.1.

Corollary 3.

Theorem 4.

Corollary 5.

Theorem 6.