A Geometric Approach to Dynamical Model-Order Reduction

Florian Feppon; Pierre F.J. Lermusiaux

arXiv:1705.08521·math.DS·April 4, 2018

A Geometric Approach to Dynamical Model-Order Reduction

Florian Feppon, Pierre F.J. Lermusiaux

PDF

TL;DR

This paper introduces a geometric framework for model-order reduction of stochastic PDEs, analyzing the manifold of fixed rank matrices and deriving dynamical systems that optimize low-rank approximations.

Contribution

It provides a detailed geometric analysis of the fixed rank matrix manifold and develops explicit dynamical systems for low-rank approximation and model reduction.

Findings

01

The curvature of the fixed rank matrix manifold is characterized.

02

Error bounds for the DO approximation are established based on singular value gaps.

03

Algorithms for adaptive low-rank matrix approximation are proposed.

Abstract

Any model order reduced dynamical system that evolves a modal decomposition to approximate the discretized solution of a stochastic PDE can be related to a vector field tangent to the manifold of fixed rank matrices. The Dynamically Orthogonal (DO) approximation is the canonical reduced order model for which the corresponding vector field is the orthogonal projection of the original system dynamics onto the tangent spaces of this manifold. The embedded geometry of the fixed rank matrix manifold is thoroughly analyzed. The curvature of the manifold is characterized and related to the smallest singular value through the study of the Weingarten map. Differentiability results for the orthogonal projection onto embedded manifolds are reviewed and used to derive an explicit dynamical system for tracking the truncated Singular Value Decomposition (SVD) of a time-dependent matrix. It is…

Figures7

Click any figure to enlarge with its caption.

Tables1

$ℳ_{l, m}$	Space of $l$ -by- $m$ real matrices
$ℳ_{m, r}^{*}$	Space of $m$ -by- $r$ matrices that have full rank
$rank (R)$	Rank of a matrix $R \in ℳ_{l, m}$
$ℳ = {R \in ℳ_{l, m} \| rank (R) = r}$	Fixed rank matrix manifold
$𝒪_{r} = {P \in ℳ_{r, r} \| P^{T} P = I}$	Group of $r$ -by- $r$ orthogonal matrices
${St}_{l, r} = {U \in ℳ_{l, r} \| U^{T} U = I}$	Stiefel Manifold
$R = U Z^{T}$	Point $R \in ℳ$ with $U \in {St}_{l, r}$ and $Z \in ℳ_{m, r}^{*}$
$𝒯 (R)$	Tangent space at $R \in ℳ$
$X \in 𝒯 (R)$	Tangent vector $X$ at $R = U Z^{T}$
$ℋ_{(U, Z)}$	Horizontal space at $R = U Z^{T}$
$(X_{U}, X_{Z}) \in ℋ_{(U, Z)}$	$X = X_{U} Z^{T} + U X_{Z}^{T} \in 𝒯 (R)$ with
	$X_{U} \in ℳ_{l, r}, X_{Z} \in ℳ_{m, r}$ and $U^{T} X_{U} = 0$
$Π_{𝒯 (R)}$	Orthogonal projection onto the plane $𝒯 (R)$
$Sk (ℳ)$	Skeleton of $ℳ$
$Π_{ℳ}$	Orthogonal projection onto $ℳ$ (defined on $ℳ_{l, m} \ Sk (ℳ)$ )
$I$	Identity mapping
$A^{T}$	Transpose of a square matrix $A$
$⟨ A, B ⟩ = Tr (A^{T} B)$	Frobenius matrix scalar product
$‖ A ‖ = Tr {(A^{T} A)}^{1 / 2}$	Frobenius norm
$σ_{1} (A) \geq \dots \geq σ_{rank (A)} (A)$	Non zeros singular values of $A \in ℳ_{l, m}$
$\dot{R} = d R / d t$	Time derivative of a trajectory $R (t)$
$D_{X} f (R)$	Differential of a function $f$ in direction $X$
$D Π_{𝒯 (R)} (X) \cdot Y$	Differential of the projection operator $Π_{𝒯 (R)}$ applied to $Y$

Equations188

\partial_{t} u = L (t, u; ω),

\partial_{t} u = L (t, u; ω),

\dot{R} = L (t, R),

\dot{R} = L (t, R),

u (t, x; ω) ≃ u_{DO} = i = 1 \sum r ζ_{i} (t, ω) u_{i} (t, x),

u (t, x; ω) ≃ u_{DO} = i = 1 \sum r ζ_{i} (t, ω) u_{i} (t, x),

R (ρ, θ, ϕ) = ρ (sin (θ) sin (ϕ) cos (θ) sin (ϕ) sin (θ) cos (ϕ) cos (θ) cos (ϕ)), ρ > 0, θ \in [0, 2 π], ϕ \in [0, 2 π],

R (ρ, θ, ϕ) = ρ (sin (θ) sin (ϕ) cos (θ) sin (ϕ) sin (θ) cos (ϕ) cos (θ) cos (ϕ)), ρ > 0, θ \in [0, 2 π], ϕ \in [0, 2 π],

\dot{R} = L (t, R) \in T (R),

\dot{R} = L (t, R) \in T (R),

D_{X} f (R) = \frac{d}{d t} f (R (t))_{t = 0} = Δ t \to 0 lim \frac{f ( R ( t + Δ t )) - f ( R ( t ))}{Δ t},

D_{X} f (R) = \frac{d}{d t} f (R (t))_{t = 0} = Δ t \to 0 lim \frac{f ( R ( t + Δ t )) - f ( R ( t ))}{Δ t},

D Π_{T (R)} (X) \cdot Y = [\frac{d}{d t} Π_{T (R (t))}_{t = 0}] (Y) = [Δ t \to 0 lim \frac{Π _{T (R (t + Δ t))} - Π _{T (R (t))}}{Δ t}] (Y),

D Π_{T (R)} (X) \cdot Y = [\frac{d}{d t} Π_{T (R (t))}_{t = 0}] (Y) = [Δ t \to 0 lim \frac{Π _{T (R (t + Δ t))} - Π _{T (R (t))}}{Δ t}] (Y),

M = {R \in M_{l, m} ∣ rank (R) = r} .

M = {R \in M_{l, m} ∣ rank (R) = r} .

U_{1} Z_{1}^{T} = U_{2} Z_{2}^{T} \Leftrightarrow \exists P \in O_{r}, U_{1} = U_{2} P and Z_{1} = Z_{2} P .

U_{1} Z_{1}^{T} = U_{2} Z_{2}^{T} \Leftrightarrow \exists P \in O_{r}, U_{1} = U_{2} P and Z_{1} = Z_{2} P .

T (U Z^{T}) = {X_{U} Z^{T} + U X_{Z}^{T} ∣ X_{U} \in M_{l, r}, X_{Z} \in M_{m, r}, U^{T} X_{U} + X_{U}^{T} U = 0} .

T (U Z^{T}) = {X_{U} Z^{T} + U X_{Z}^{T} ∣ X_{U} \in M_{l, r}, X_{Z} \in M_{m, r}, U^{T} X_{U} + X_{U}^{T} U = 0} .

H_{(U, Z)} = {(X_{U}, X_{Z}) \in M_{l, r} \times M_{m, r} ∣ U^{T} X_{U} = 0},

H_{(U, Z)} = {(X_{U}, X_{Z}) \in M_{l, r} \times M_{m, r} ∣ U^{T} X_{U} = 0},

X

X

= U (\dot{Z}^{T} + U^{T} \dot{U} Z^{T}) + ((I - U U^{T}) \dot{U}) Z^{T} = X_{U} Z^{T} + U X_{Z}^{T},

g ((X_{U}, X_{Z}), (Y_{U}, Y_{Z}))

g ((X_{U}, X_{Z}), (Y_{U}, Y_{Z}))

= Tr (Z^{T} Z X_{U}^{T} Y_{U} + X_{Z}^{T} Y_{Z}) .

\begin{array}[]{ccccc}\Pi_{\mathcal{T}(UZ^{T})}&:&\mathcal{M}_{l,m}&\rightarrow&\mathcal{H}_{(U,Z)}\\ &&\mathfrak{X}&\mapsto&((I-UU^{T})\mathfrak{X}Z(Z^{T}Z)^{-1},\mathfrak{X}^{T}U).\end{array}

\begin{array}[]{ccccc}\Pi_{\mathcal{T}(UZ^{T})}&:&\mathcal{M}_{l,m}&\rightarrow&\mathcal{H}_{(U,Z)}\\ &&\mathfrak{X}&\mapsto&((I-UU^{T})\mathfrak{X}Z(Z^{T}Z)^{-1},\mathfrak{X}^{T}U).\end{array}

\forallΔ \in M_{l, r}, Δ^{T} U = 0 \Rightarrow \frac{\partial J}{\partial X _{U}} \cdot Δ = - ⟨ X - X_{U} Z^{T} - U X_{Z}^{T}, Δ Z^{T} ⟩ = 0,

\forallΔ \in M_{l, r}, Δ^{T} U = 0 \Rightarrow \frac{\partial J}{\partial X _{U}} \cdot Δ = - ⟨ X - X_{U} Z^{T} - U X_{Z}^{T}, Δ Z^{T} ⟩ = 0,

\forallΔ \in M_{m, r}, \frac{\partial J}{\partial X _{Z}} \cdot Δ = - ⟨ X - X_{U} Z^{T} - U X_{Z}^{T}, U Δ^{T} ⟩ = 0,

\forallΔ \in M_{m, r}, \frac{\partial J}{\partial X _{Z}} \cdot Δ = - ⟨ X - X_{U} Z^{T} - U X_{Z}^{T}, U Δ^{T} ⟩ = 0,

N (R)

N (R)

= {N \in M_{l, m} ∣ U^{T} N = 0 and N Z = 0} .

U Z^{T} = i = 1 \sum r σ_{i} u_{i} v_{i}^{T} and N = i = 1 \sum k σ_{r + i} u_{r + i} v_{r + i}^{T} .

U Z^{T} = i = 1 \sum r σ_{i} u_{i} v_{i}^{T} and N = i = 1 \sum k σ_{r + i} u_{r + i} v_{r + i}^{T} .

\nabla_{X} Y = Π_{T (R)} (D_{X} Y) .

\nabla_{X} Y = Π_{T (R)} (D_{X} Y) .

Γ (X, Y) = - (I - Π_{T (R)}) D_{X} Y = - D Π_{T (R)} (X) \cdot Y .

Γ (X, Y) = - (I - Π_{T (R)}) D_{X} Y = - D Π_{T (R)} (X) \cdot Y .

\nabla_{\dot{R}} \dot{R} = \ddot{R} - D Π_{T (R)} (\dot{R}) \cdot \dot{R} = 0.

\nabla_{\dot{R}} \dot{R} = \ddot{R} - D Π_{T (R)} (\dot{R}) \cdot \dot{R} = 0.

\nabla_{X} Y = (D_{X} Y_{U} + U X_{U}^{T} Y_{U} + (X_{U} Y_{Z}^{T} + Y_{U} X_{Z}^{T}) Z (Z^{T} Z)^{- 1}, D_{X} Y_{Z} - Z Y_{U}^{T} X_{U}) .

\nabla_{X} Y = (D_{X} Y_{U} + U X_{U}^{T} Y_{U} + (X_{U} Y_{Z}^{T} + Y_{U} X_{Z}^{T}) Z (Z^{T} Z)^{- 1}, D_{X} Y_{Z} - Z Y_{U}^{T} X_{U}) .

\left\{\begin{array}[]{rl}\ddot{U}+U\dot{U}^{T}\dot{U}+2\dot{U}\dot{Z}^{T}Z(Z^{T}Z)^{-1}=&0\\ \ddot{Z}-Z\dot{U}^{T}\dot{U}=&0.\end{array}\right.

\left\{\begin{array}[]{rl}\ddot{U}+U\dot{U}^{T}\dot{U}+2\dot{U}\dot{Z}^{T}Z(Z^{T}Z)^{-1}=&0\\ \ddot{Z}-Z\dot{U}^{T}\dot{U}=&0.\end{array}\right.

D_{X} Y

D_{X} Y

= D_{X} Y_{U} Z^{T} + U D_{X} Y_{Z}^{T} + X_{U} Y_{Z}^{T} + Y_{U} X_{Z}^{T} .

\nabla_{X} Y = Π_{(U, Z)} (D_{X} Y) = ((I - U U^{T}) D_{X} Y Z (Z^{T} Z)^{- 1}, D_{X} Y^{T} U),

\nabla_{X} Y = Π_{(U, Z)} (D_{X} Y) = ((I - U U^{T}) D_{X} Y Z (Z^{T} Z)^{- 1}, D_{X} Y^{T} U),

\nabla_{X} Y = ((I - U U^{T}) D_{X} (Y_{U}) + (X_{U} Y_{Z}^{T} + Y_{U} X_{Z}^{T}) Z (Z^{T} Z)^{- 1}, D_{X} (Y_{Z}) + Z D_{X} (Y_{U}^{T}) U) .

\nabla_{X} Y = ((I - U U^{T}) D_{X} (Y_{U}) + (X_{U} Y_{Z}^{T} + Y_{U} X_{Z}^{T}) Z (Z^{T} Z)^{- 1}, D_{X} (Y_{Z}) + Z D_{X} (Y_{U}^{T}) U) .

\begin{array}[]{cccc}\exp_{UZ^{T}}:&{\mathcal{T}(UZ^{T})}&\rightarrow&\mathscr{M}\\ &X&\mapsto&R(1),\end{array}

\begin{array}[]{cccc}\exp_{UZ^{T}}:&{\mathcal{T}(UZ^{T})}&\rightarrow&\mathscr{M}\\ &X&\mapsto&R(1),\end{array}

τ_{R R (1)} X = \dot{U} (1) Z (1)^{T} + U (1) \dot{Z} (1)^{T},

τ_{R R (1)} X = \dot{U} (1) Z (1)^{T} + U (1) \dot{Z} (1)^{T},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark

\headersA geometric approach to dynamical model–order reductionF. Feppon and P.F.J. Lermusiaux

\externaldocumentex_supplement

A geometric approach to

dynamical model–order reduction

Florian Feppon

Pierre F.J. Lermusiaux MSEAS, Massachusetts Institute of Technology (, ). [email protected]

[email protected]

Abstract

Any model order reduced dynamical system that evolves a modal decomposition to approximate the discretized solution of a stochastic PDE can be related to a vector field tangent to the manifold of fixed rank matrices. The Dynamically Orthogonal (DO) approximation is the canonical reduced order model for which the corresponding vector field is the orthogonal projection of the original system dynamics onto the tangent spaces of this manifold. The embedded geometry of the fixed rank matrix manifold is thoroughly analyzed. The curvature of the manifold is characterized and related to the smallest singular value through the study of the Weingarten map. Differentiability results for the orthogonal projection onto embedded manifolds are reviewed and used to derive an explicit dynamical system for tracking the truncated Singular Value Decomposition (SVD) of a time-dependent matrix. It is demonstrated that the error made by the DO approximation remains controlled under the minimal condition that the original solution stays close to the low rank manifold, which translates into an explicit dependence of this error on the gap between singular values. The DO approximation is also justified as the dynamical system that applies instantaneously the SVD truncation to optimally constrain the rank of the reduced solution. Riemannian matrix optimization is investigated in this extrinsic framework to provide algorithms that adaptively update the best low rank approximation of a smoothly varying matrix. The related gradient flow provides a dynamical system that converges to the truncated SVD of an input matrix for almost every initial data.

keywords:

Model order reduction, fixed rank matrix manifold, low rank approximation, Singular Value Decomposition, orthogonal projection, curvature, Weingarten map, Dynamically Orthogonal approximation, Riemannian matrix optimization.

{AMS}

65C20, 53B21, 65F30, 15A23, 53A07, 35R60, 65M15

1 Introduction

Finding efficient model order reduction methods is an issue commonly encountered in a wide variety of domains involving intensive computations and expensive high-fidelity simulations [66, 61, 38, 13]. Such domains include uncertainty quantification [25, 46, 69, 72], dynamical systems analysis [30, 9, 80], electrical engineering [24, 8], mechanical engineering [54], ocean and weather predictions [41, 49, 12, 62], chemistry [55], and biology [40], to name a few. The computational costs and challenges arise from the complexity of the mathematical models as well as from the needs of representing variations of parameter values and the dominant uncertainties involved. For example, to quantify uncertainties of dynamical system fields, one often needs to solve stochastic partial differential equations (PDEs),

[TABLE]

where $t$ is time, $\bm{u}$ the uncertain dynamical fields, $\mathscr{L}$ a differential operator, and $\omega$ a random event. For deterministic but parametric dynamical systems, $\omega$ may represent a large set of possible parameter values that need to be accounted for by the model-order reduction. Generally, after both spatial and stochastic/parametric event discretization of the PDE Eq. 1, or more directly if the focus is on solving a complex system of ordinary differential equations (ODEs), one is interested in the numerical solution of a large system of ODEs of the form

[TABLE]

where $\mathcal{L}$ is an operator acting on the space of $l$ -by- $m$ matrices $\mathfrak{R}$ . In the case of a direct Monte-Carlo approach for the resolution of the stochastic PDE Eq. 1, $\mathcal{L}$ is thought as being the discretization of the differential operator $\mathscr{L}$ by using $l$ spatial nodes and $m$ Monte-Carlo realizations or parameter values being considered. Accurate quantification of the statistical/parametric properties of the original solution $\bm{u}$ often require to solve such system Eq. 2 with both a high spatial resolution, $l$ , and high number of realizations, $m$ . Hence, solving Eq. 2 directly with a Monte-Carlo approach becomes quickly intractable for realistic, real-time applications such as ocean and weather predictions [57, 44] or real-time control [39, 47].

A method to address this challenge is to assume the existence of an approximation $\bm{u}_{\textrm{DO}}$ of the solution $\bm{u}$ onto a finite number of $r$ spatial modes, $\bm{u}_{i}(t,x)$ , and stochastic coefficients, $\zeta_{i}(t,\omega)$ (here assumed to be both time-dependent [44, 64]),

[TABLE]

and look for a dynamical system that would most accurately govern the evolution of these dominant modes and coefficients. The optimal approximation (in the sense that the $L^{2}$ error $\mathbb{E}[||\bm{u}-\bm{u}_{\textrm{DO}}||^{2}]^{1/2}$ is minimized) is achieved by the Karuhnen-Loève (KL) decomposition [48, 30], whose first $r$ modes yields an optimal orthonormal basis $(\bm{u}_{i})$ . Many methods, such as polynomial chaos expansions [82], Fourier decomposition [79], or Proper Orthogonal Decomposition [30] rely on the choice of a predefined, time-independent orthonormal basis either for the modes, $(\bm{u}_{i})$ , or the coefficients, $(\zeta_{i})$ , and obtain equations for the respective unknown coefficients or modes by Galerkin projection [58]. However, the use of modes and coefficients that are simultaneously dynamic has been shown to be efficient [44, 45]. Dynamically Orthogonal (DO) field equations [64, 65] were thus derived to evolve adaptively this decomposition for a general differential operator $\mathscr{L}$ and allowed to obtain efficient simulations of stochastic Navier-Stokes equations [73].

At the discrete level, the decomposition Eq. 3 is written $\mathfrak{R}\simeq R=UZ^{T}$ where $R$ is a rank $r$ approximation of the full rank matrix $\mathfrak{R}$ , decomposed as the product of a $l$ -by- $r$ matrix $U$ containing the discretization of the basis functions, $(\bm{u}_{i})$ , and of a $m$ -by- $r$ matrix $Z$ containing the realizations of the stochastic coefficients, $(\zeta_{i})$ . It is well known that such approximation is optimal (in the Frobenius norm) when $R=UZ^{T}$ is obtained by truncating the Singular Value Decomposition (SVD), i.e. by selecting $U$ to be the singular vectors associated with the $r$ largest singular values of $\mathfrak{R}$ and setting $Z=\mathfrak{R}^{T}U$ [32, 31]. In 2007, Koch and Lubich [37] proposed a method inspired from the Dirac Frenkel variational principle in quantum physics, to evolve dynamically a rank $r$ matrix $R=UZ^{T}$ that approximates the full dynamical system Eq. 2. The main principle of the method lies in the intuition that one can update optimally the low-rank approximation $R$ by projecting $\mathcal{L}(t,R)$ onto the tangent space of the manifold constituted by low rank matrices. Recently, Musharbash [52] noticed the parallel with the DO method, and applied the results obtained in [37] to analyze the error committed by the DO approximation for a stochastic heat equation. In fact, in the same way the KL expansion is the continuous analogous of the SVD, the discretization of the DO decomposition [64] is strictly equivalent to the dynamical low rank approximation of Koch and Lubich [37] when the discretization reduces to simulate the matrix dynamical system Eq. 2 of $m$ realizations spatially resolved with $l$ nodes.

Simultaneously, new approaches have emerged since the 1990s in optimization onto matrix sets [18, 3]. The application of Riemannian geometry to manifolds of matrices has allowed the development of new optimization algorithms, that are evolving orthogonality constraints geometrically rather than using more classical techniques, such as Lagrange multiplier methods [18]. Matrix dynamical systems that continuously perform matrices operations, such as inversion, eigen- or SVD-decompositions, steepest descents, and gradient optimization have thus been proposed [10, 14, 70]. These continuous–time systems were extended and applied to adaptive uncertainty predictions, learning of dominant subspace, and data assimilation [42, 43].

The purpose of this article is to extend the analysis and the understanding of the DO method in the matrix framework as initiated by [37] and in the above works, by furthering its relation to the Singular Value Decomposition and its geometric interpretation as a constrained dynamics on the manifold $\mathscr{M}$ of fixed rank matrices. In the vein of [18, 3, 50], this article utilizes the point of view of differential geometry. To provide a visual intuition, a 3D projection of two 2-dimensional subsurfaces of the manifold $\mathscr{M}$ of rank one 2-by-2 matrices is visible on Fig. 1. This figure has been obtained by using the parameterization

[TABLE]

on $\mathscr{M}$ and projecting orthogonally two subsurfaces by plotting the first three elements $R_{11},R_{12}$ and $R_{21}$ . Since the multiplication of singular values by a non-zero constant does not affect the rank of a matrix, $\mathscr{M}\subset\mathcal{M}_{2,2}$ is a cone, which is consistent with the increasing of curvature visible on 1(a) near the origin. More generally, $\mathscr{M}$ is the union of $r$ -dimensional affine subspaces of $\mathcal{M}_{l,m}$ supported by the manifold of strictly lower rank matrices. It will actually be proven in Section 4 that the curvature of $\mathscr{M}$ is inversely proportional to the lowest singular value, which diverges as matrices approach a rank strictly less than $r$ . Hence $\mathscr{M}$ can be understood either as a collection of cones (1(b)) or as a multidimensional spiral (1(a)).

Geometrically, a dynamical system Eq. 2 can be seen as a time dependent vector field $\mathcal{L}$ that assigns the velocity $\mathcal{L}(t,\mathfrak{R})$ at time $t$ at each point $\mathfrak{R}$ of the ambient space $\mathcal{M}_{l,m}$ of $l$ -by- $m$ matrices (2(a)). Similarly, any rank $r$ model order reduction can be viewed as a vector field $L$ that must be everywhere tangent to the manifold $\mathscr{M}$ of rank $r$ matrices. The corresponding dynamical system is

[TABLE]

where ${\mathcal{T}(R)}$ denotes the tangent space of $\mathscr{M}$ at $R$ .

From this point of view, “combing the hair” formed by the original vector field $\mathcal{L}$ on the manifold $\mathscr{M}$ , by setting $L(t,R)$ to the time-dependent orthogonal projection of each vector $\mathfrak{X}=\mathcal{L}(t,R)$ onto each tangent space ${\mathcal{T}(R)}$ is nothing less than the DO approximation (2(b)). As such, the DO-reduced dynamical system is optimal in the sense that the resulting vector field $L$ is the best dynamic tangent approximation of $\mathcal{L}$ at every point $R\in\mathscr{M}$ .

Analyzing the error committed by the DO approximation can be done by understanding how the best rank $r$ approximation of the solution $\mathfrak{R}$ evolves [37, 52]. This requires the time derivative of the truncated SVD as a function of $\dot{\mathfrak{R}}$ . Nevertheless, to the best of our knowledge, no explicit expression of the dynamical system satisfied by the best low rank approximation has been obtained in the literature. To address this gap, this article brings forward the following novelties. First, a more exhaustive study of the extrinsic geometry of the fixed rank manifold $\mathscr{M}$ is provided. This includes the characterization and derivation of principal curvatures and of their direct relation to singular values. Second, the geometric interpretation of the truncated SVD as an orthogonal projection onto $\mathscr{M}$ is utilized, so as to apply existing results relating the differential of this projection to the curvature of the manifold. It will be demonstrated in particular (Theorem 4.8) that the truncated SVD is differentiable so long as the singular values of order $r$ and $r+1$ remain distinct, even if multiple singular values of lower order occur. As a result, an explicit dynamical system is obtained for the evolution of the best low rank approximation of the solution $\mathfrak{R}(t)$ of Eq. 2. This derivation finally also allows a sharpening of the the initial error analysis of [37].

The article is organized as follows: the Riemannian geometric setting is specified in Section 2. Parameterizations of $\mathscr{M}$ and of its tangent spaces are first recalled. Novel geometric characteristics such as covariant derivative and geodesic equations are then derived. In Section 3, classical results on the differentiability of the orthogonal projection onto smooth embedded sub-manifolds [26] are reviewed and reformulated in a framework that avoids the use of tensor notations. Curvatures with respect to a normal direction are defined, and their relation to the differential of the projection map is stated in Theorem 3.8. These results are applied in Section 4 where the curvature of the fixed rank manifold $\mathscr{M}$ is characterized, and the new formula for the differential of the truncated SVD is provided. The Dynamically Orthogonal approximation (DO) is studied in Section 5. Two justifications of the “reasonable” character of this approximation are given. First, it is shown that this reduced order model corresponds to the dynamical system that applies the SVD truncation at all instants. The error analysis performed by [37] is then extended and improved using the knowledge of the differential of the truncated SVD. The error committed by the DO approximation is shown to be controlled over large integration times provided the original solution remains close to the low rank manifold $\mathscr{M}$ , in the sense that it remains far from the skeleton of $\mathscr{M}$ . This geometric condition can be expressed as an explicit dependence of the error on the gaps between singular values of order $r$ and $r+1$ . Lastly, Riemannian matrix optimization on the fixed rank manifold equipped with the extrinsic geometry is considered in Section 6 as an alternative approach for tracking the truncated SVD. A novel dynamical system is proposed to compute the best low-rank approximation, that is shown to be convergent for almost any initial data.

Notations

Important notations used in this paper are summed up below :

The differential of a smooth function $f$ at the point $R\in\mathcal{M}_{l,m}$ (respectively $R\in\mathscr{M}$ ) in the direction $X\in\mathcal{M}_{l,m}$ (respectively $X\in{\mathcal{T}(R)}$ ) is denoted $\mathrm{D}_{X}f(R)$ :

[TABLE]

where $R(t)$ is a curve of $\mathcal{M}_{l,m}$ (respectively $\mathscr{M}$ ) such that $R(0)=R$ and $\dot{R}(0)=X$ . The differential of the orthogonal projection operator $R\mapsto\Pi_{\mathcal{T}(R)}$ at $R\in\mathscr{M}$ , in the direction $X\in{\mathcal{T}(R)}$ and applied to $Y\in\mathcal{M}_{l,m}$ is denoted $\mathrm{D}\Pi_{{\mathcal{T}(R)}}(X)\cdot Y$ :

[TABLE]

where $R(t)$ is a curve drawn on $\mathscr{M}$ such that $R(0)=R$ and $\dot{R}(0)=X$ .

2 Riemannian set up: parameterizations, tangent-space, geodesics

This section establishes the geometric framework of low-rank approximation, by reviewing and unifying results sparsely available in [37, 64, 52], and by providing new expressions for classical geometric characteristics, namely geodesics and covariant derivative. It is not assumed that the reader is accustomed to differential geometry: necessary definitions and properties are recalled. Several concepts of this section are illustrated on 2(b).

Definition 2.1.

The manifold of $l$ -by- $m$ matrices of rank $r$ is denoted by $\mathscr{M}$ :

[TABLE]

Remark 2.2.

*The fact that $\mathscr{M}$ is a manifold is a consequence of the constant rank theorem ([71], Th.10, chap.2, vol. 1) whose assumptions (the map $(U,Z)\mapsto UZ^{T}$ from $\mathrm{St}_{l,r}\times\mathcal{M}_{m,r}^{*}$ to $\mathscr{M}$ is a submersion with differential of constant rank) translate in the requirement that the candidate tangent spaces have constant dimension, as found later in Proposition 2.4. Detailed proofs are available in [71] (exercise 34, chap. 2, vol. 1) or [75] (Prop. 2.1). *

The following lemma [60] fixes the parametrization of $\mathscr{M}$ by conveniently representing its elements $R$ in terms of mode and coefficient matrices, $U$ and $Z$ , respectively.

Lemma 2.3.

Any matrix $R\in\mathscr{M}$ can be decomposed as $R=UZ^{T}$ where $U\in\mathrm{St}_{l,r}$ and $Z\in\mathcal{M}_{m,r}^{*}$ , i.e. $U^{T}U=I\textrm{ and }\mathrm{rank}(Z)=r$ , respectively. Furthermore, this decomposition is unique modulo a rotation matrix $P\in O_{r}$ , namely if $U_{1},U_{2}\in\mathcal{M}_{l,r}$ , $Z_{1},Z_{2}\in\mathcal{M}_{m,r}$ , and $U_{1}^{T}U_{1}=U_{2}^{T}U_{2}=I$ , then

[TABLE]

In the following, the statement “let $UZ^{T}\in\mathscr{M}$ ” always implicitly assumes $U\in\mathcal{M}_{l,r}$ , $Z\in\mathcal{M}_{m,r}$ , $U^{T}U=I$ , and $\mathrm{rank}(Z)=r$ . Other parameterizations of $\mathscr{M}$ are possible and give equivalent results [50].

The tangent space ${\mathcal{T}(UZ^{T})}$ at a point $R=UZ^{T}$ is the set of all possible vectors tangent to smooth curves $R(t)=U(t)Z(t)^{T}$ drawn on the manifold $\mathscr{M}$ . Therefore, such tangent vector at $R(0)=UZ^{T}$ is of the form $\dot{R}=\dot{U}Z^{T}+U\dot{Z}^{T}$ , where $\dot{U}$ and $\dot{Z}$ are the time derivatives of the matrices $U(t)$ and $Z(t)$ at time $t=0$ . In the following, the notations $X_{U}$ , $X_{Z}$ , and $X=X_{U}Z^{T}+UX_{Z}^{T}$ will be used to denote the tangent directions $\dot{U}$ , $\dot{Z}$ , and $\dot{R}$ for the respective matrices $U$ , $Z$ and $R$ . The orthogonality condition that $U^{T}U=I$ must hold for all times implies that $X_{U}$ must satisfy $\dot{U}^{T}U+U^{T}\dot{U}=X_{U}^{T}U+U^{T}X_{U}=0$ .

Nevertheless, this is not sufficient to parameterize uniquely tangent vectors $X$ from the displacements $X_{U}$ and $X_{Z}$ for $U$ and $Z$ : two different couples $(X_{U},X_{Z})\neq(X_{U}^{\prime},X_{Z}^{\prime})$ satisfying $X_{U}^{T}U+U^{T}X_{U}=X_{U}^{{}^{\prime}T}U+U^{T}X_{U}^{\prime}=0$ may exist for a single tangent vector $X=X_{U}Z^{T}+UX_{Z}^{T}=X_{U}^{\prime}Z^{T}+UX_{Z}^{{}^{\prime}T}$ . Indeed, rotations $U\leftarrow UP$ of the columns of the mode matrix $U$ do not change the subspace $\textrm{span}(\bm{u}_{i})$ supporting the modal decomposition Eq. 3, and hence can be captured by updating the values of the coefficients $(\zeta_{i})$ contained in the matrix $Z$ with the same rotation $Z\leftarrow ZP$ . This translates infinitesimally in the tangent space by the invariance of tangent vectors $X=X_{U}Z^{T}+UX_{Z}^{T}$ under the transformations $X_{U}\leftarrow X_{U}+U\Omega$ and $X_{Z}\leftarrow X_{Z}+Z\Omega$ for any skew-symmetric matrix $\Omega=-\Omega^{T}$ . This can easily be seen by inserting the transformations into the expression for $X$ or by differentiating the relation $UZ^{T}=(UP)(ZP)^{T}$ with $\dot{P}=\Omega P$ . A unique parameterization of the tangent space can be obtained by fixing this infinitesimal rotation $\Omega$ , for example by adding the condition that the reduced subspace spanned by the columns of $U$ must dynamically evolve orthogonally to itself, in other words by requiring $U^{T}X_{U}=0$ . This gauge condition has thus been called “Dynamically Orthogonal” condition by [64] and is at the origin of the name “Dynamically Orthogonal approximation” as further investigated in Section 5.

Proposition 2.4.

The tangent space of $\mathscr{M}$ at $R=UZ^{T}\in\mathscr{M}$ is the set

[TABLE]

${\mathcal{T}(UZ^{T})}$ * is uniquely parameterized by the horizontal space*

[TABLE]

*that is for any tangent vector $X\in{\mathcal{T}(UZ^{T})}$ , there exists a unique $(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}$ such that $X=X_{U}Z^{T}+UX_{Z}^{T}$ . As a consequence $\mathscr{M}$ is a smooth manifold of dimension $\mathrm{dim}(\mathcal{H}_{(U,Z)})=(l+m)r-r^{2}$ . *

Proof 2.5.

(see also [37, 3]) One can always write a tangent vector $X$ as

[TABLE]

*for some $\dot{U}\in\mathrm{St}_{l,r}$ and $\dot{Z}\in\mathcal{M}_{m,r}$ with $X_{U}=(I-UU^{T})\dot{U}Z^{T}$ satisfying $X_{U}^{T}U=0$ and $X_{Z}^{T}=\dot{Z}^{T}+U^{T}\dot{U}Z^{T}$ . This implies ${\mathcal{T}(UZ^{T})}=\{X_{U}Z^{T}+UX_{Z}^{T}|(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}\}$ . Furthermore, if $X=UX_{Z}^{T}+X_{U}Z^{T}$ with $U^{T}X_{U}=0$ , then the relations $X_{Z}=X^{T}U$ and $X_{U}=(I-UU^{T})XZ(Z^{T}Z)^{-1}$ show that $(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}$ is defined uniquely from $X$ . *

Remark 2.6.

*The denomination “horizontal space” for the set $\mathcal{H}_{(U,Z)}$ Eq. 9 refers to the definition of a non-ambiguous representation of the tangent space ${\mathcal{T}(UZ^{T})}$ Eq. 8. This notion is developed rigorously in the theory of quotient manifolds e.g. [50, 18]. *

In the following, the notation $X=(X_{U},X_{Z})$ is used equivalently to denote a tangent vector $X=X_{U}Z^{T}+UX_{Z}^{T}\in{\mathcal{T}(UZ^{T})}$ , where $U^{T}X_{U}=0$ is implicitly assumed.

A metric is needed to define how distances are measured on the manifold, by prescribing a smoothly varying scalar product on each tangent space. In [50] and others in matrix optimization e.g. [5, 75, 67], one uses the metric induced by the parametrization of the manifold $\mathscr{M}$ : the norm of a tangent vector $(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}$ is defined to be $||(X_{U},X_{Z})||^{2}=||X_{U}||_{\mathrm{St}_{l,r}}^{2}+||X_{Z}||_{\mathcal{M}_{m,r}}^{2}$ where $||\;||_{\mathrm{St}_{l,r}}$ is a canonical norm on the Stiefel Manifold (see [18]) and $||\;||_{\mathcal{M}_{m,r}}$ is the Frobenius norm on $\mathcal{M}_{m,r}$ . In this work, one is rather interested in the metric inherited from the ambient full space $\mathcal{M}_{l,m}$ , since it is the metric used to estimate the distance from a matrix $\mathfrak{R}\in\mathcal{M}_{l,m}$ to its best $r$ -rank approximation, namely the error committed by the truncated SVD.

Definition 2.7.

At each point $UZ^{T}\in\mathscr{M}$ , the metric $g$ on $\mathscr{M}$ is the scalar product acting on the tangent space ${\mathcal{T}(UZ^{T})}$ that is inherited from the scalar product of $\mathcal{M}_{l,m}$ :

[TABLE]

A main object of this paper is the orthogonal projection $\Pi_{\mathcal{T}(R)}$ onto the tangent space ${\mathcal{T}(R)}$ at a point $R$ on $\mathscr{M}$ . This map projects displacements $\mathfrak{X}=\dot{\mathfrak{R}}\in\mathcal{M}_{l,m}$ of a matrix $\mathfrak{R}$ of the ambient space $\mathcal{M}_{l,m}$ to the tangent directions $X=\Pi_{\mathcal{T}(R)}\mathfrak{X}\in{\mathcal{T}(R)}$ .

Proposition 2.8.

At every point $UZ^{T}\in\mathscr{M}$ , the orthogonal projection $\Pi_{{\mathcal{T}(UZ^{T})}}$ onto the tangent space ${\mathcal{T}(UZ^{T})}$ is the application

[TABLE]

Proof 2.9.

(see also [37]) $\Pi_{\mathcal{T}(R)}\mathfrak{X}$ is obtained as the unique minimizer of the convex functional $J(X_{U},X_{Z})=\frac{1}{2}||\mathfrak{X}-X_{U}Z^{T}-UX_{Z}^{T}||^{2}$ on the space $\mathcal{H}_{(U,Z)}$ . The minimizer $(X_{U},X_{Z})$ is characterized by the vanishing of the gradient of $J$ :

[TABLE]

*yielding respectively $X_{U}=(I-UU^{T})\mathfrak{X}Z(Z^{T}Z)^{-1}$ and $X_{Z}=\mathfrak{X}^{T}U$ . *

The orthogonal complement of the tangent space ${\mathcal{T}(R)}$ is obtained from the identity $(I-\Pi_{\mathcal{T}(UZ^{T})})\cdot\mathfrak{X}=(I-UU^{T})\mathfrak{X}(I-Z(Z^{T}Z)^{-1}Z^{T})$ :

Definition 2.10.

The normal space ${\mathcal{N}(R)}$ of $\mathscr{M}$ at $R=UZ^{T}$ is defined as the orthogonal complement to the tangent space ${\mathcal{T}(R)}$ . For the fixed rank manifold $\mathscr{M}$ :

[TABLE]

In model order reduction, a matrix $R=UZ^{T}\in\mathscr{M}$ is usually a low rank- $r$ approximation of a full rank matrix $\mathfrak{R}\in\mathcal{M}_{l,m}$ . The following proposition shows that the normal space at $R$ , ${\mathcal{N}(R)}$ , can be understood as the set of all possible completions of the approximation Eq. 3:

Proposition 2.11.

Let $N$ be a given normal vector $N\in{\mathcal{N}(R)}$ at $R=UZ^{T}\in\mathscr{M}$ and denote $k=\mathrm{rank}(N)$ . Then there exists an orthonormal basis of vectors $(u_{i})_{1\leq i\leq l}$ in $\mathbb{R}^{l}$ , an orthonormal basis $(v_{i})_{1\leq i\leq m}$ of $\mathbb{R}^{m}$ , and $r+k$ non zero singular values $(\sigma_{i})_{1\leq i\leq r+k}$ such that

[TABLE]

Proof 2.12.

*Consider $N=U_{N}\Theta V_{N}^{T}$ the SVD decomposition of $N$ [31]. Since $U^{T}N=0$ , $r$ columns of $U_{N}$ are spanned by $U$ and associated with zero singular values of $N$ , therefore $u_{i}$ is obtained from the columns of $U$ for $1\leq i\leq r$ and from the left singular vectors of $N$ associated with non zero singular values for $r+1\leq i\leq r+k$ , $k\geq 0$ . The vectors $v_{i}$ and $v_{r+j}$ are obtained similarly. The singular values $\sigma_{i}$ are obtained by reunion of the respective $r$ and $k$ non-zeros singular values of $Z$ and $N$ . *

In differential geometry, one distinguishes the geometric properties that are intrinsic, i.e. that depend only on the metric $g$ defined on the manifold, from the ones that are extrinsic, i.e. that depend on the ambient space in which the manifold $\mathscr{M}$ is defined. The following proposition recalls the link between the extrinsic projection $\Pi_{\mathcal{T}(R)}$ and the intrinsic notion of derivation onto a manifold. For embedded manifolds, i.e. defined as subsets of an ambient space, the covariant derivative at $R\in\mathscr{M}$ is obtained by projecting the usual derivative onto the tangent space ${\mathcal{T}(R)}$ , and the Christoffel symbol corresponds to the normal component that has been removed [18].

Proposition 2.13.

Let $X$ and $Y$ be two tangent vector fields defined on a neighborhood of $R\in\mathscr{M}$ . The covariant derivative $\nabla_{X}Y$ with respect to the metric inherited from the ambient space is the projection of $\mathrm{D}_{X}Y$ onto the tangent space ${\mathcal{T}(R)}$ :

[TABLE]

The Christoffel symbol $\Gamma(X,Y)$ is defined by the relationship $\nabla_{X}Y=\mathrm{D}_{X}Y+\Gamma(X,Y)$ and is characterized by the formula

[TABLE]

*The Christoffel symbol is symmetric: $\Gamma(X,Y)=\Gamma(Y,X)$ . *

Proof 2.14.

*See [71], Vol.3, Ch.1. *

Remark 2.15.

*An important feature of this definition is that the Christoffel symbol $\Gamma(X,Y)=-\mathrm{D}\Pi_{\mathcal{T}(R)}(X)\cdot Y$ , depends only on the projection map $\Pi_{T}$ at the point $R$ and not on neighboring values of the tangent vectors $X,Y$ , which is a priori not clear from the equality $\Gamma(X,Y)=-(I-\Pi_{\mathcal{T}(R)})\mathrm{D}_{X}Y$ . The Christoffel symbol $\Gamma(X,Y)$ is computed explicitly for the matrix manifold $\mathscr{M}$ in Remark 4.5. *

The covariant derivative allows to obtain equations for the geodesics of the manifold $\mathscr{M}$ . These geodesics (2(b)) are the shortest paths among all possible smooth curves drawn on $\mathscr{M}$ joining two points sufficiently close. Mathematically, they are curves $R(t)=U(t)Z(t)$ characterized by a velocity $\dot{R}=\dot{U}Z^{T}+U\dot{Z}^{T}$ that is stationary under the covariant derivative [71], i.e. $\nabla_{\dot{R}}\dot{R}=0$ . Since $\mathrm{D}_{\dot{R}}\dot{R}=\ddot{R}$ , this leads to

[TABLE]

Theorem 2.16.

Consider $X=(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}$ and $Y=(Y_{U},Y_{Z})\in\mathcal{H}_{(U,Z)}$ two tangent vector fields. The covariant derivative $\nabla_{X}Y$ on $\mathscr{M}$ is given by

[TABLE]

Therefore, geodesic equations on $\mathscr{M}$ are given by

[TABLE]

Proof 2.17.

Writing $X=X_{U}Z^{T}+UX_{Z}^{T}$ and $Y=Y_{U}Z^{T}+UY_{Z}^{T}$ , one obtains:

[TABLE]

Applying the projection $\Pi_{T}(UZ^{T})$ using eqn. Eq. 11, i.e.

[TABLE]

yields in the coordinates of the horizontal space:

[TABLE]

Eq. 15* is obtained by differentiating the constraint $U^{T}Y_{U}=0$ along the direction $X$ , i.e. $X_{U}^{T}Y_{U}+U^{T}D_{X}Y_{U}=0$ , and replacing accordingly $U^{T}D_{X}Y_{U}$ into the above expression. Since $\mathrm{D}_{(\dot{U},\dot{Z})}(\dot{U})=\ddot{U}$ and $\mathrm{D}_{(\dot{U},\dot{Z})}(\dot{Z})=\ddot{Z}$ , $\nabla_{(\dot{U},\dot{Z})}(\dot{U},\dot{Z})=0$ yields eqs. Eq. 16. *

Remark 2.18.

*Physically, a curve $R(t)=U(t)Z(t)^{T}$ describes a geodesic on $\mathscr{M}$ if and only if its acceleration lies in the normal space at all instants (eqn. Eq. 14) [18, 71]. *

Geodesics allow to define the exponential map [71], which indicates how to walk on the manifold from a point $R\in\mathscr{M}$ along a straight direction $X\in{\mathcal{T}(R)}$ .

Definition 2.19.

The exponential map $\exp_{UZ^{T}}$ at $R=UZ^{T}\in\mathscr{M}$ is the function

[TABLE]

where $R(1)=U(1)Z(1)^{T}$ is the value at time 1 of the solution of the geodesic equation Eq. 16 with initial conditions $U(0)Z(0)^{T}=R$ and $(\dot{U}(0),\dot{Z}(0))=X$ . The value of the velocity of the point $R(1)=\exp_{UZ^{T}}(X)$ ,

[TABLE]

*is called the parallel transport of $X$ from $R$ to $R(1)$ . *

3 Curvature and differentiability of the orthogonal projection onto smooth

embedded manifolds

Differentiability results for the orthogonal projection onto smooth embedded manifolds, as presented with tensor notations in [7], are now centralized and adapted to the present study. The main motivation is that the SVD truncation (Section 4) is an example of such orthogonal projection in the particular case of the fixed-rank manifold. Hence, general geometric differentiability results for the projections will transpose directly into a formula for the differential of the application mapping a matrix to its best low rank approximation. The same analysis can be applied to other matrix manifolds to obtain the differential of other algebraic operations, and even generalized to non-Euclidean ambient spaces, which is the object of [23]. In this section, the space of $l$ -by- $m$ matrices $\mathcal{M}_{l,m}$ is replaced with a general finite dimensional Euclidean space $E$ , and the fixed rank manifold with any given smooth embedded manifold $\mathscr{M}\subset E$ .

Definition 3.1.

Let $\mathscr{M}$ be a smooth manifold embedded in an Euclidian space $E$ . The orthogonal projection of a point $\mathfrak{R}$ onto $\mathscr{M}$ is defined whenever there is a unique point $\Pi_{\mathscr{M}}(\mathfrak{R})\in\mathscr{M}$ minimizing the Euclidean distance from $\mathfrak{R}$ to $\mathscr{M}$ , i.e.

[TABLE]

A fundamental property of the orthogonal projection is that the vector $\mathfrak{R}-R$ is normal to $\mathscr{M}$ for the point $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ , as geometrically illustrated on 2(b):

Proposition 3.2.

Whenever $\Pi_{\mathscr{M}}(\mathfrak{R})$ is defined, the residual $\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})\in{\mathcal{N}(R)}$ must be normal to $\mathscr{M}$ at $R$ , namely

[TABLE]

Proof 3.3.

*For any tangent vector $X\in{\mathcal{T}(R)}$ , consider a curve $R(t)$ drawn on $\mathscr{M}$ such that $R(0)=R$ and $\dot{R}(0)=X$ where $R$ is minimizing $J(R)=\frac{1}{2}||\mathfrak{R}-R||^{2}$ . Then the stationarity condition $\left.\frac{\mathrm{d}}{\mathrm{d}t}\right|_{t=0}J(R(t))=-\langle\mathfrak{R}-R,X\rangle=0$ states precisely Eq. 19. *

The following proposition, also used in the proofs of [37], provides an equation for the differential of $\Pi_{\mathscr{M}}$ , that will be solved by the study of the curvature of $\mathscr{M}$ .

Proposition 3.4.

Suppose the projection $\Pi_{\mathscr{M}}$ is defined and differentiable at $\mathfrak{R}$ . Then the differential $\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R})$ of $\Pi_{\mathscr{M}}$ at the point $\mathfrak{R}$ in the direction $\mathfrak{X}\in E$ satisfies :

[TABLE]

Proof 3.5.

Differentiating equation Eq. 19 along the direction $\mathfrak{X}$ yields

[TABLE]

*Since $\Pi_{\mathscr{M}}(\mathfrak{R})\in\mathscr{M}$ for any $\mathfrak{R}$ , the differential $\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R})$ is a tangent vector, and the results follows from the relation $\Pi_{\mathcal{T}(\Pi_{\mathscr{M}}(\mathfrak{R}))}(\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R}))=\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R})$ . *

Let $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ be the projection of the point $\mathfrak{R}$ on $\mathscr{M}$ and $N=\mathfrak{R}-R$ the corresponding normal residual vector. Solving Eq. 20 for the differential $X=\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R})$ requires to invert the linear operator $I-L_{R}(N)$ where $L_{R}(N)$ is the map $X\mapsto\mathrm{D}\Pi_{\mathcal{T}(R)}(X)\cdot N$ . $L_{R}(N)$ would be zero if $\mathscr{M}$ were to be a “flat” vector subspace and can be interpreted as a curvature correction. In fact, $L_{R}(N)$ is nothing else than the Weingarten map, at the origin of the definition of principal curvatures. For embedded hypersurface, this application maps tangent vectors $X$ to the tangent variations $-\mathrm{D}_{X}N$ of the unit normal vector field $N$ , and the eigenvalues and eigenvectors of this symmetric endomorphism define the principal curvatures and directions of the hypersurface ([71], Vol. 2). For general smooth embedded sub-manifolds, a Weingarten map is defined for every possible normal direction [68, 7, 4, 2].

Definition 3.6 (Weingarten map).

For any point $R\in\mathscr{M}$ , tangent and normal vector fields $X,Y\in{\mathcal{T}(R)}$ and $N\in{\mathcal{N}(R)}$ defined on a neighborhood of $R$ , the following relation, called Weingarten identity holds:

[TABLE]

Also, the tangent variations $\Pi_{\mathcal{T}(R)}(\mathrm{D}_{X}N)$ depend only on the value of the normal vector field $N$ at $R$ as it can be seen from the identity

[TABLE]

The application

[TABLE]

is therefore a symmetric map of the tangent space into itself and is called the Weingarten map in the normal direction $N$ . The corresponding eigenvectors and eigenvalues are respectively called the principal directions and principal curvatures of $\mathscr{M}$ in the normal direction $N$ . The induced symmetric bilinear form on the tangent space,

[TABLE]

*is called the second fundamental form in the direction $N$ . *

Proof 3.7.

*See [68] or the proof Theorem 5 of [71], vol.3, ch.1. *

The differentiability of the projection map for arbitrary sets has been studied in [81, 1] and more recently in the context of smooth manifolds in [7, 26, 11] with recent applications in shape optimization [6]. The following theorem reformulates these results in the framework of this article. The proof given in Appendix A is essentially a justification that one can indeed invert the operator $I-L_{R}(N)$ by using its eigendecomposition. Recall that the adherence $\overline{\mathscr{M}}$ is the set of limit points of $\mathscr{M}$ . In this paper, the boundary of a manifold is defined as the set $\partial\mathscr{M}=\overline{\mathscr{M}}\backslash\mathscr{M}$ .

Theorem 3.8.

Let $\Omega\subset E$ be an open set of $E$ and assume that for any $\mathfrak{R}\in\Omega$ , there exists a unique projection $\Pi_{\mathscr{M}}(\mathfrak{R})\in\mathscr{M}$ such that

[TABLE]

and that in addition, there is no other projection on the boundary $\partial\mathscr{M}$ of $\mathscr{M}$ :

[TABLE]

For $\mathfrak{R}\in\Omega$ , denote $\kappa_{i}(N)$ and $\Phi_{i}$ the respective eigenvalues and eigenvectors of the Weingarten map $L_{R}(N)$ at $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ with the normal direction $N=\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})$ . Then all the principal curvatures satisfy $\kappa_{i}(N)<1$ and the projection $\Pi_{\mathscr{M}}$ is differentiable at $\mathfrak{R}$ . The differential $\mathrm{D}_{\mathfrak{X}}\Pi_{\mathscr{M}}(\mathfrak{R})$ at $\mathfrak{R}$ in the direction $\mathfrak{X}$ satisfies

[TABLE]

Proof 3.9.

*See Appendix A or [7]. *

The set $\textrm{Sk}(\mathscr{M})\subset E$ of points that admit more than one possible projection is called the skeleton of $\mathscr{M}$ (see [15]). One cannot expect the projection map to be differentiable at points that are in the adherence $\overline{\textrm{Sk}(\mathscr{M})}$ , as there is a “jump” of the projected values across $\textrm{Sk}(\mathscr{M})$ (Fig. 3).

Equation Eq. 26 is analogous to the formula presented in [26] for hyper-surfaces (Lemma 14.17). In this framework, one retrieves the usual notion of principal curvature by considering the eigenvalues $\kappa_{i}(N)$ for a normalized normal vector $N$ . Curvature radius being defined as inverse of curvatures: $\rho_{i}=\kappa_{i}\left(\frac{N}{||N||}\right)^{-1}$ , the condition $\kappa_{i}(N)=||\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})||/\rho_{i}\neq 1$ states that the projection $\Pi_{\mathscr{M}}$ is differentiable at points $\mathfrak{R}$ that are not center of curvature. Note that assumption (25) is required to deal with non closed manifolds (boundary points being not considered as part of the manifold), which is the case for the fixed rank matrix manifold.

4 Curvature of the fixed rank matrix manifold and the differentiability of the SVD truncation

In the following, $\mathscr{M}\subset\mathcal{M}_{l,m}$ denotes again the fixed rank matrix manifold of Definition 2.1 and $E=\mathcal{M}_{l,m}$ is the space of $l$ -by- $m$ matrices. It is well known [27, 32] that the truncated SVD, i.e. the map that set all singular values of a matrix $\mathfrak{R}$ to zero except the $r$ highest, yields the best rank $r$ approximation.

Definition 4.1.

Let $\mathfrak{R}\in\mathcal{M}_{l,m}$ a matrix of rank at least $r$ , i.e. $r+k,k\geq 0$ , and denote $\mathfrak{R}=\sum_{i=1}^{r+k}\sigma_{i}(\mathfrak{R})u_{i}v_{i}^{T}$ its singular value decomposition. If $\sigma_{r}(\mathfrak{R})>\sigma_{r+1}(\mathfrak{R})$ , then the rank $r$ truncated SVD

[TABLE]

*is the unique matrix $R\in\mathscr{M}$ minimizing the Euclidian distance $R\mapsto||\mathfrak{R}-R||$ . *

Remark 4.2.

The skeleton of $\mathscr{M}$ (Fig. 3) is therefore the set

[TABLE]

*characterized by the crossing of the singular values of order $r$ and $r+1$ . *

In the following, the Weingarten map for the fixed rank manifold is derived. Note that its expression has been previously found by [4] under the form of equation Eq. 31 below.

Proposition 4.3.

The Weingarten map $L_{R}(N)$ of the fixed rank manifold $\mathscr{M}$ in the normal direction $N\in{\mathcal{N}(R)}$ is the application:

[TABLE]

Or, denoting $R=\sum_{i=1}^{r}\sigma_{i}u_{i}v_{i}^{T}$ and $N=\sum_{j=1}^{k}\sigma_{r+j}u_{r+j}v_{r+j}^{T}$ as in Proposition 2.11, this can be rewritten more explicitly as

[TABLE]

The second fundamental form is given by:

[TABLE]

Proof 4.4.

Differentiating Eq. 11 along the tangent direction $X=(X_{U},X_{Z})\in\mathcal{H}_{(U,Z)}$ , and using the relations $U^{T}N=0$ and $NZ=0$ , yields

[TABLE]

The normality of $N$ implies that $(NX_{Z}(Z^{T}Z)^{-1},N^{T}X_{U})$ is a vector of the horizontal space and therefore equation Eq. 27 follows. Eqn. Eq. 30 can be rewritten as

[TABLE]

*by expressing $X_{U}=(I-UU^{T})XZ(Z^{T}Z)^{-1}$ and $X_{Z}=X^{T}U$ in terms of $X$ (eqn. Eq. 11), from which is derived eqn. Eq. 28 by introducing singular vectors $(u_{i})$ , $(v_{i})$ and singular values $(\sigma_{i})$ . One obtainsEq. 29 by evaluating the scalar product $\langle X,L_{R}(N)(Y)\rangle$ with the metric $g$ (equation Eq. 10). *

Remark 4.5.

The Christoffel symbol is deduced from equations Eq. 29 and Eq. 23:

[TABLE]

Theorem 4.6.

Consider a point $R=UZ^{T}=\sum_{i=1}^{r}\sigma_{i}u_{i}v_{i}^{T}\in\mathscr{M}$ and a normal vector $N=\sum_{j=1}^{k}\sigma_{r+j}u_{r+j}v_{r+j}^{T}\in{\mathcal{N}(R)}$ (no ordering of the singular values is assumed). At $R$ and in the direction $N$ , there are $2kr$ non-zero principal curvatures

[TABLE]

for all possible combinations of non-zero singular values $\sigma_{r+j},\sigma_{i}$ for $1\leq i\leq r$ and $1\leq j\leq k$ . The normalized corresponding principal directions are the tangent vectors

[TABLE]

The other principal curvatures are null and associated with the principal subspace

[TABLE]

Proof 4.7.

From Eq. 28, it is clear that $L_{R}(N)\Phi_{i,r+j}^{\pm}=\kappa_{i,r+j}^{\pm}(N)\Phi_{i,r+j}^{\pm}$ . In addition, $\Phi_{i,r+j}^{\pm}$ is indeed a tangent vector as one can write $\Phi_{i,r+j}^{\pm}=X_{U}Z^{T}\pm UX_{Z}^{T}$ with:

[TABLE]

*Therefore $(\Phi_{i,r+j}^{\pm})$ is a family of $2kr$ independent eigenvectors. Then it is easy to check that $\mathrm{span}\{(u_{i}v^{T})_{1\leq i\leq r}|Nv=0\}$ and $\mathrm{span}\{(uv_{i}^{T})_{1\leq i\leq r}|u^{T}N=u^{T}U=0\}$ are null eigenspaces of respective dimension $(m-k)r$ and $(l-k-r)r$ . The total dimension obtained is $(m-k)r+(l-k-r)r+2kr=mr+lr-r^{2}$ , implying that the full spectral decomposition has been characterized. *

This theorem shows that the maximal curvature of $\mathscr{M}$ (for normalized normal directions $||N||=1$ ) is $\sigma_{r}(\mathfrak{R})^{-1}$ and hence diverges as the smallest singular value goes to 0. This fact confirms what is visible on Fig. 1: the manifold $\mathscr{M}$ can be seen as a collection of cones or as a multidimensional spiral, whose axes are the lower dimensional manifolds of matrices of rank strictly less than $r$ . Applying directly the formula Eq. 26 of Theorem 3.8, one obtains an explicit expression for the differential of the truncated SVD:

Theorem 4.8.

Consider $\mathfrak{R}\in\mathcal{M}_{l,m}$ with rank greater than $r$ and denote $\mathfrak{R}=\sum_{i=1}^{r+k}\sigma_{i}u_{i}v_{i}^{T}$ its SVD decomposition, where the singular values are ordered decreasingly: $\sigma_{1}\geq\sigma_{2}\geq\dots\geq\sigma_{r+k}$ . Suppose that the orthogonal projection $\Pi_{\mathscr{M}}(\mathfrak{R})=UZ^{T}$ of $\mathfrak{R}$ onto $\mathscr{M}$ is uniquely defined, that is $\sigma_{r}>\sigma_{r+1}$ . Then $\Pi_{\mathscr{M}}$ , the truncated SVD of order $r$ , is differentiable at $\mathfrak{R}$ and the differential $\mathrm{D}_{\mathfrak{X}}\Pi(\mathfrak{R})$ in a direction $\mathfrak{X}\in\mathcal{M}_{l,m}$ is given by the formula

[TABLE]

where $\Phi_{i,r+j}^{\pm}$ are the principal directions of equation Eq. 33. More explicitly,

[TABLE]

Proof 4.9.

*The set $\{\mathfrak{R}\in\mathcal{M}_{l,m},\sigma_{r+1}(\mathfrak{R})>\sigma_{r}(\mathfrak{R})\}$ is open by continuity of the singular values, therefore condition (24) of Theorem 3.8 is fulfilled. The boundary $\overline{\mathscr{M}}\backslash\mathscr{M}$ is the set of matrices of rank strictly lower than $r$ , hence condition (25) is also fulfilled. Equation (34) follows by replacing $\kappa_{i}(N)$ and $\Phi_{i}$ in Eq. 26 by the corresponding curvature eigenvalues $\pm\frac{\sigma_{r+j}}{\sigma_{i}}$ and eigenvectors $\Phi_{i,r+j}^{\pm}$ of Theorem 4.6. *

Remark 4.10.

*Dehaene [14] and Dieci and Eirola [17] have previously derived formulas for the time derivative of singular values and singular vectors of a smoothly varying matrix. One can also certainly use these results to find formula Eq. 35 by differentiating singular values $(\sigma_{i})$ and singular vectors $(u_{i}),(v_{i})$ separately in $\sum_{i=1}^{r}\sigma_{i}u_{i}v_{i}^{T}$ . In the present work, the proof of Theorem 4.8 does not require singular values to remain simple, and formula Eq. 34 is obtained directly from its geometric interpretation. *

5 The Dynamically Orthogonal Approximation

The above results are now utilized for model order reduction. Following the introduction, the DO approximation is defined to be the dynamical system obtained by replacing the vector field $\mathcal{L}(t,\cdot)$ with its tangent projection on the manifold. (2(b)).

Definition 5.1.

The maximal solution in time of the reduced dynamical system on $\mathscr{M}$ ,

[TABLE]

is called the Dynamically Orthogonal (DO) approximation of Eq. 2. The solution $R(t)=U(t)Z^{T}(t)$ is governed by a dynamical system for the mode matrix $U$ and the coefficient matrix $Z$ such that $(\dot{U},\dot{Z})\in\mathcal{H}_{(U,Z)}$ satisfies the dynamically orthogonal condition $U^{T}\dot{U}=0$ at every instant:

[TABLE]

Remark 5.2.

Equations Eq. 37 are exactly those presented as DO equations in [64, 63]. With the notation of Eqs. 1 and 3, using $\langle\cdotp,\cdotp\rangle$ to denote the continuous dot product operator (an integral over the spatial domain) and $\mathbb{E}$ the expectation, they were written as the following set of coupled stochastic PDEs:

[TABLE]

*However, when dealing with infinite dimensional Hilbert spaces, the vector space of solutions of Eq. 1 depends on the PDEs, which complicates the derivation of a general theory for Eq. 38. Considering the DO approximation as a computational method for evolving low rank matrices relaxes these issues through the finite-dimensional setting. *

Remark 5.3.

*One can relate Eq. 36 to projected dynamical systems encountered in optimization [53], where the manifold $\mathscr{M}$ is replaced with a compact convex set. *

In the following, two justifications of the accuracy of this approximation are given. First, the DO approximation is shown to be the continuous limit of a scheme that would truncate the SVD of the full matrix solution after each time step, and hence is instantaneously optimal among any other possible model order reduced system. Then, its dynamics is compared to that of the best low rank approximation, yielding error bounds on global integration times. The efficiency of the DO approach in the context of the discretization of a stochastic PDE is not discussed here. These points are examined in [22] and in references cited therein.

5.1 The DO system applies instantaneously the truncated SVD

This paragraph details first a “computational” interpretation of the DO approximation. Consider the temporal integration of the dynamical system Eq. 2 over $(t^{n},t^{n+1})$ ,

[TABLE]

where $\overline{\mathcal{L}}(t,\mathfrak{R},\Delta t)$ denotes the full-space integral $\overline{\mathcal{L}}(t,\mathfrak{R},\Delta t)=\frac{1}{\Delta t}\int_{t}^{t+\Delta t}\mathcal{L}(s,\mathfrak{R}(s))\mathrm{d}s$ for the exact integration or the increment function [28] for a numerical integration. Examples of the latter include $\overline{\mathcal{L}}(t,\mathfrak{R},\Delta t)=\mathcal{L}(t,\mathfrak{R})$ for forward Euler and $\overline{\mathcal{L}}(t,\mathfrak{R},\Delta t)=\mathcal{L}(t+\Delta t/2,\mathfrak{R}+\Delta t/2\,\mathcal{L}(t,\mathfrak{R}))$ for a second-order Runge-Kutta scheme. Assume that the solution $\mathfrak{R}^{n}$ at time $t^{n}$ is well approximated by a rank $r$ matrix $R^{n}$ . A natural way to estimate the best rank $r$ approximation $\Pi_{\mathscr{M}}(\mathfrak{R}^{n+1})$ at the next time step is then to set

[TABLE]

Such a numerical scheme uses the truncated SVD, $\Pi_{\mathscr{M}}$ , to remove after each time step of the initial time-integration Eq. 39 the optimal amount of information required to constrain the rank of the solution. A data-driven adaptive version of this approach was for example used in [42, 43]. One can then look for a dynamical system for which Eq. 40 would be a temporal discretization. One then finds that, for any rank $r$ matrix $R\in\mathscr{M}$ ,

[TABLE]

holds true since the curvature term depending on $N=R-\Pi_{\mathscr{M}}(R)=0$ vanishes in Eq. 26, and $\overline{\mathcal{L}}(t,R,0)=\mathcal{L}(t,R)$ by consistency of the time marching with the exact integration Eq. 39 [28]. This implies, under sufficient regularity condition on $\mathcal{L}$ , that the continuous limit of the scheme Eq. 40 is the DO dynamical system Eq. 36.

Theorem 5.4.

Assume that the DO solution Eq. 36 is defined on a time interval $[0,T]$ discretized with $N_{T}$ time steps $\Delta t=T/N_{T}$ and denote $t^{n}=n\Delta t$ . Consider $R^{n}$ the sequence obtained from the class of schemes Eq. 40. Assume that $\mathcal{L}$ is Lipschitz continuous, that is there exists a constant $K$ such that

[TABLE]

Then the sequence $R^{n}$ converges uniformly to the DO solution $R(t)$ in the following sense:

[TABLE]

Proof 5.5.

It is sufficient to check that the scheme Eq. 40 is both consistent and stable (see [28]). Denote $\Phi$ the increment function of the scheme Eq. 40:

[TABLE]

*with $g(R,t,\tau,\Delta t)=R+\tau\Delta t\overline{\mathcal{L}}(t,R,\Delta t)$ . Consider a compact neighborhood $\mathcal{U}$ of $\mathcal{M}_{l,m}$ containing the trajectory $R(t)$ on the interval $[0,T]$ and sufficiently thin such that $\mathcal{U}$ does not intersect the skeleton of $\mathscr{M}$ . In particular, $\Pi_{\mathscr{M}}$ is differentiable with respect to $R$ on the compact neighborhood $\mathcal{U}$ , hence Lipschitz continuous. The consistency of Eq. 40 and continuity of $\Phi$ on $[0,T]\times\mathcal{U}\times\mathbb{R}$ follows from Eq. 41. For usual time marching schemes (e.g. Runge Kutta), the Lipschitz condition Eq. 42 also holds for the map $R\mapsto\overline{\mathcal{L}}(t,R,\Delta t)$ . Therefore it $\Phi$ is also Lipschitz continuous with respect to $R$ on $\mathcal{U}$ by composition. This is a sufficient stability condition. *

As such, the DO approximation can be interpreted as the dynamical system that applies instantaneously the truncated SVD to constrain the rank of the solution. Therefore, other reduced order models of the form Eq. 4 are characterized by larger errors on short integration times for solutions whose initial value lies on $\mathscr{M}$ .

Remark 5.6.

*Other dynamical systems that perform instantaneous matrix operations have been derived in [10, 70], and in [14] (e.g. Lemma 3.4 and Corollary 3.5) or [17] (sections 2.1 and 2.3.) for tracking the full SVD or QR decomposition. Continuous SVD has been combined with adaptive Kalman filtering in uncertainty quantification to continuously adapt the dominant subspace supporting the stochastic solution [42, 43, 41]. All of these results utilized the instantaneous truncated SVD concept and formed the computational basis of the continuous DO dynamical system. In fact, the dominant singular vectors of state transition matrices and other operators have found varied applications in atmospheric and ocean sciences for some time [19, 20, 56, 33, 45, 51, 35, 16]. *

5.2 The DO approximation is close to the dynamics of the best low rank

approximation of the original solution

Ideally, a model order reduced solution $R(t)$ would coincide at all times with the best rank $r$ approximation $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ , so as to keep the error $||\mathfrak{R}(t)-R(t)||$ minimal. However, $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ is not the solution of a reduced system of the form Eq. 4 as its time derivative depends on the knowledge of the true solution $\mathfrak{R}$ in the full space $\mathcal{M}_{l,m}$ . Indeed, formula Eq. 35 for the differential of the SVD yields the following system of ODEs for the evolution of modes and coefficients of the best rank $-r$ approximation $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ :

[TABLE]

where the (time-dependent) SVD of $\mathfrak{R}(t)$ at the time $t$ is $\sum_{i=1}^{r+k}\sigma_{i}u_{i}v_{i}^{T}$ with $k=\min(m,l)$ (allowing possibly $\sigma_{r+j}=0$ for $1\leq j\leq k$ ). One therefore sees from this best rank $-r$ governing differential Eq. 44 that its reduced DO system Eq. 36 is obtained by (i) replacing the derivative $\dot{\mathfrak{R}}=\mathcal{L}(t,\mathfrak{R})$ with the approximation $\mathcal{L}(t,R)$ (first terms in each of the right-hand sides of Eq. 44), and (ii) neglecting the dynamics corresponding to the interactions between the low-rank $-r$ approximation (singular values and vectors of order $1\leq i\leq r$ ) and the neglected normal component (singular values and vectors of order $r+j$ for $1\leq j\leq k$ ). These interactions are the last summation terms in each right-hand sides of Eq. 44. Estimating these interactions in all generality would require, in addition to the knowledge of a rank $r$ approximation $R\simeq\Pi_{\mathscr{M}}(\mathfrak{R})$ , either external observations [43] or closure models [76], so as to estimate the otherwise neglected normal component $\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})=\sum_{j=1}^{k}\sigma_{r+j}u_{r+j}v_{r+j}^{T}$ .

Comparing the dynamics Eq. 37 of the DO approximation to that of the governing differential Eq. 44 of the best low rank $-r$ approximation, a bound for the growth of the DO error is now obtained.

Theorem 5.7.

Assume that both the original solution $\mathfrak{R}(t)\in\mathcal{M}_{l,m}$ (eqn. Eq. 2) and its DO approximation $R(t)$ (eqn. Eq. 36) are defined on a time interval $[0,T]$ and that the following conditions hold:

$\mathcal{L}$ * is Lipschitz continuous, i.e. equation Eq. 42 holds.* 2. 2.

The original (true) solution $\mathfrak{R}(t)$ remains close to the low rank manifold $\mathscr{M}$ , in the sense that $\mathfrak{R}(t)$ does not cross the skeleton of $\mathscr{M}$ on $[0,T]$ , i.e. there is no crossing of the singular value of order $r$ :

[TABLE]

Then, the error of the DO approximation $R(t)$ (eqn. Eq. 36) remains controlled by the best approximation error $||\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R}(t))||$ on $[0,T]$ :

[TABLE]

where $\eta$ is the constant

[TABLE]

Proof 5.8.

*A proof is given in Appendix B. *

This statement improves the result expressed in [37] (Theorem 5.1), since no assumption is made on the smallness of the best approximation error $||\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})||$ , nor on the boundedness of $||R-\Pi_{\mathscr{M}}(\mathfrak{R})||$ . Theorem 5.7 also highlights two sufficient conditions for the error committed by the DO approximation to remain small :

Condition

1

The discrete operator $\mathcal{L}$ must not be too sensitive to the error $\mathfrak{R}(t)-R(t)$ , namely the Lipschitz constant $K$ must be small. This error is commonly encountered by any approximation made for evaluating the operator of a dynamical system (as a consequence of Gronwall’s lemma [29]). The Lipschitz constant $K$ also quantifies how fast the vector field $\mathcal{L}$ may deviate from its values when getting away from the low rank manifold $\mathscr{M}$ .

Condition 2

Independently of the choice of the reduced order model, the solution of the initial system Eq. 2, $\mathfrak{R}(t)$ , must remain close to the manifold $\mathscr{M}$ , or in other words, must remain far from the skeleton $\textrm{Sk}(\mathscr{M})$ of $\mathscr{M}$ . As visible on Fig. 3, the best rank $r$ approximation $\Pi_{\mathscr{M}}(\mathfrak{R})$ of $\mathfrak{R}$ exhibits a jump when $\mathfrak{R}$ crosses the skeleton, i.e. when $\sigma_{r}(\mathfrak{R})=\sigma_{r+1}(\mathfrak{R})$ occurs. At that point, the discontinuity of $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ cannot be tracked by the DO or any other smooth dynamical approximation. Condition 2 in some sense supersedes the stronger condition of “smallness of the initial truncation error” of the error analysis of [37]. Indeed, when $\sigma_{r}(\mathfrak{R})\simeq\sigma_{r+1}(\mathfrak{R})$ occurs, as observed numerically in [52], the DO solution may then diverge sharply from the SVD truncation. From the point of view of model order reduction, the resulting error can be related to the evolution of the residual $\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})$ that is not accounted for by the reduced order model. When the crossing of singular values occurs, neglected modes in the approximation Eq. 3 become “dominant”, but cannot be captured by a reduced order model that has evolved only the first modes initially dominant. In such cases, one has to restart the simulations from the initial conditions with a larger subspace size or the size of the DO subspace has to be increased and corrections applied from external information. The latter learning of the subspace can be done from measurements or from additional Monte-Carlo simulations and breeding of the best low-rank $-r$ approximation [43, 35, 65].

Last, it should be noted that the growth rate $\eta$ (equation Eq. 46) of the error increases as the evolved trajectory becomes close to be singular, i.e. when $\sigma_{r}(\mathfrak{R}(t))$ goes to zero. This growth rate comes mathematically from the Gronwall estimates of the proofs, and is intuitively related to the fact the tangent projection $\Pi_{\mathcal{T}}$ in Eq. 36 is applied at the location of the DO solution $R(t)$ instead of the one of the best approximation $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ . If the evolved trajectory is close to be singular, the local curvature of $\mathscr{M}$ experienced by the DO solution $R(t)$ and the best approximation $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ is high. Therefore the tangent spaces ${\mathcal{T}(R(t))}$ and ${\mathcal{T}(\Pi_{\mathscr{M}}(\mathfrak{R}(t)))}$ may be oriented very differently because of this curvature, resulting in increased error when approximating the tangent projection operator $\Pi_{\mathcal{T}(\Pi_{\mathscr{M}}(\mathfrak{R}(t)))}$ by $\Pi_{\mathcal{T}(R(t))}$ in the DO system Eq. 36.

Remark 5.9.

Theorems 5.4* and 5.7 may be generalized in a straightforward manner to the case of any smooth embedded manifolds $\mathscr{M}\subset E$ (Theorem 2.5 and 2.6 in [21]). *

6 Optimization on the fixed rank matrix manifold for tracking the best low rank

approximation

This section applies the framework of Riemannian matrix optimization [18, 2] as an alternative approach to the direct tracking of the truncated SVD of a time-dependent matrix $\mathfrak{R}(t)\in\mathcal{M}_{l,m}$ . At the end, we provide a remark (Remark 6.6) linking the two approaches within the context of the DO system.

Consider a given (full-rank) matrix $\mathfrak{R}\in\mathcal{M}_{l,m}$ and recall that $\Pi_{\mathscr{M}}(\mathfrak{R})$ , when it is non-ambiguously defined, is the unique minimizer of the distance functional

[TABLE]

Riemannian optimization algorithms, namely gradient descents and Newton methods on the fixed rank manifold $\mathscr{M}$ , are now used to provide alternative ways to more standard direct algebraic algorithms [27] for evaluating the truncated SVD $\Pi_{\mathscr{M}}(\mathfrak{R})$ . Such optimizations can be useful to dynamically update the best low rank approximation of a time dependent matrix $\mathfrak{R}(t)$ : this is because for a sufficiently small time step $\Delta t$ , $R(t)=\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ is expected to be close to $R(t+\Delta t)=\Pi_{\mathscr{M}}(\mathfrak{R}(t+\Delta t))$ , hence $\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ provides a good initial guess for the minimization of $R\mapsto||\mathfrak{R}(t+\Delta t)-R||$ . The minimization of the distance functional $J$ has already been considered in the matrix optimization community [3, 75, 50] that derived gradient descent and Newton methods on the fixed rank manifold, but not in the case of the metric inherited from the ambient space $\mathcal{M}_{l,m}$ (eqn. Eq. 10), which is done in what follows. As a benefit of this “extrinsic” approach already noticed in [4], the covariant Hessian of $J$ relates directly to the Weingarten map at critical points: this will allow obtaining the convergence of the gradient descent for almost every initial data (Proposition 6.3).

Ingredients required for the minimization of $J$ on the manifold $\mathscr{M}$ are first derived, namely the covariant gradient and Hessian. As reviewed in [18], usual optimization algorithms such as gradient and Newton methods can be straightforwardly adapted to matrix manifolds. The differences with their Euclidean counterparts is that: (i) usual gradient and Hessians must be replaced by their covariant equivalents; (ii) one needs to follow geodesics instead of straight lines to move on the manifold; and, (iii) directions followed at the previous time steps, needed for example in the conjugate gradient method, must be transported to the current location (equation Eq. 18). Covariant gradient and Hessian are recalled in the following definition (for details, see [3], chapter 5).

Definition 6.1.

Let $J$ be a smooth function defined on $\mathscr{M}$ and $R\in\mathscr{M}$ . The covariant gradient of $J$ at $R$ is the unique vector $\nabla J\in{\mathcal{T}(R)}$ such that

[TABLE]

The covariant Hessian $\mathcal{H}J$ of $J$ at $R$ is the linear map on ${\mathcal{T}(R)}$ defined by

[TABLE]

and the following second order Taylor approximation of $J$ holds:

[TABLE]

The following proposition (see [4]) explains how these quantities are related to the usual gradient and Hessian, so that they become accessible for computations.

Proposition 6.2.

Let $J$ be a smooth function defined in the ambient space $\mathcal{M}_{l,m}$ and denote $\mathrm{D}J$ and $\mathrm{D}^{2}J$ its respective Euclidean gradient and Hessian. Then the covariant gradient and Hessian are given by

[TABLE]

Applying directly Proposition 6.2, the gradient and the Hessian of $J$ at $R=UZ^{T}\in\mathscr{M}$ are given by:

[TABLE]

where $N_{UZ^{T}}(\mathfrak{R})=(I-\Pi_{\mathcal{T}(UZ^{T})})(\mathfrak{R}-UZ^{T})=(I-UU^{T})\mathfrak{R}(I-Z(Z^{T}Z)^{-1}Z^{T})$ is the orthogonal projection of $\mathfrak{R}-R$ onto the normal space. The Newton direction $X$ is found by solving the linear system $\mathcal{H}J(X)=-\nabla J(R)$ , that reduces to

[TABLE]

with $A=(Z^{T}Z)$ , $B=-N_{UZ^{T}}(\mathfrak{R})$ , $E=(I-UU^{T})\mathfrak{R}Z$ and $F=-Z+\mathfrak{R}^{T}U$ . This requires to solve the Sylvester equation $X_{U}A-BB^{T}X_{U}=E-BF$ for $X_{U}$ , that can be done in theory by using standard techniques [36], before computing $X_{Z}$ from $X_{Z}=F-B^{T}X_{U}$ .

It is now proven that the distance function $J$ may admit several critical points, but a unique local, hence global, minimum on $\mathscr{M}$ . As a consequence, saddle points of $J$ are unstable equilibrium solutions of the gradient flow $\dot{R}=-\nabla J(R)$ and hence are expected to be avoided by gradient descent, which will converge in practice to the global minimum $\Pi_{\mathscr{M}}(\mathfrak{R})$ . This “almost surely” convergence guarantee for the gradient descent may be compared to probabilistic analyses investigated in more general contexts [59, 77]. Our result also shows that one cannot expect the Newton method to converge for initial guesses that are far from the optimal. Indeed, this method seeks a zero of the gradient $\nabla J$ rather than a true minimum, and hence may converge or oscillate around several of the saddle points of the objective function.

Proposition 6.3.

Consider $\mathfrak{R}\in\mathcal{M}_{l,m}$ such that its projection onto $\mathscr{M}$ is well defined, that is $\sigma_{r}(\mathfrak{R})>\sigma_{r+1}(\mathfrak{R})$ . Then the distance function $J$ to $\mathfrak{R}$ (eqn. Eq. 47) admits no other local minima than $\Pi_{\mathscr{M}}(\mathfrak{R})$ . In other words, for almost any initial rank $r$ matrix $U(0)Z(0)^{T}$ , the solution $U(t)Z(t)^{T}$ of the gradient flow

[TABLE]

*converges to $\Pi_{\mathscr{M}}(\mathfrak{R})$ , the rank $r$ truncated SVD of $\mathfrak{R}$ . *

Proof 6.4.

It is known from Proposition 3.2 that the points $R$ for which $\nabla J$ vanishes are such that $\mathrm{D}J=R-\mathfrak{R}\in{\mathcal{N}(R)}$ is a normal vector. Since in addition $\mathrm{D}^{2}J=I$ , Proposition 6.2 yields the identity

[TABLE]

*where $N=-(I-\Pi_{\mathcal{T}(R)})(\mathrm{D}J)=-\mathrm{D}J=\mathfrak{R}-R\in{\mathcal{N}(R)}$ , since $\nabla J=\Pi_{\mathcal{T}(R)}(\mathrm{D}J)$ vanishes at $R$ . Let $\mathfrak{R}=\sum_{i=1}^{r+k}\sigma_{i}(\mathfrak{R})u_{i}v_{i}^{T}$ be the SVD of $\mathfrak{R}$ . For $\mathfrak{R}-R$ to be a normal vector, $R$ must necessary be of the form $R=\sum_{i\in A}\sigma_{i}u_{i}v_{i}^{T}$ where $A$ is a subset of $r$ indices $1\leq i\leq r+k$ . Then the minimum eigenvalue of the Hessian $\mathcal{H}$ is $1-\frac{\sigma_{1}(N)}{\sigma_{r}(R)}$ , which is positive if and only if $\sigma_{r}(R)>\sigma_{1}(N)$ . This happens only for $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ . *

Remark 6.5.

*The reader is referred to [34] for details regarding the convergence almost surely of sufficiently smooth gradient flows towards the unique minimizer of a function (Morse theory). *

On Fig. 4, a matrix $\mathfrak{R}\in\mathcal{M}_{l,m}$ with $m=100$ and $l=150$ is considered, with singular values chosen to be equally spaced in the interval $[1,10]$ . Three optimization algorithms detailed in [18] (gradient descent with fixed step, conjugate gradient descent, and Newton method) are implemented to find the best rank $r=5$ approximation of $\mathfrak{R}$ , with a random initialization. Convergence curves are plotted on Fig. 4: linear and quadratic rates characteristic of respectively gradient and Newton methods are obtained. As expected from Proposition 6.3, gradient descents globally converge to the truncated SVD, while Newton iterations may be attracted to any saddle point.

Remark 6.6.

*The above gradient descent and Newton methods can be combined with previously-derived numerical schemes for the time-integrated DO eqs. Eq. 40. One class of schemes consists of discretizing the ODEs Eq. 37 in time, as in [64, 73, 52, 37]. Another follows Eq. 40 directly and aims to compute the SVD truncation $\Pi_{\mathscr{M}}(\mathfrak{R})$ of $\mathfrak{R}=UZ^{T}+\Delta t\,\overline{\mathcal{L}}(t,UZ^{T},\Delta t)$ , where the increment function can be that of Euler or of higher-order explicit time marching (of course, the total rank of this $\mathfrak{R}$ depends on the dynamics and numerical scheme, and can be greater than $r$ ). Examining the expression of the gradient of $J$ (eqn. Eq. 50), one time-step of the above schemes can be interpreted as one gradient descent step for minimizing the functional $J$ . Therefore, optimization algorithms on the Riemannian manifold $\mathscr{M}$ can be combined with such DO time-stepping schemes, as further investigated in [22]. A key advantage of such optimization is the capability of altering the rank $r$ of the dynamical approximation over a time step or stage (e.g. a rank $p>r$ approximation can be used in the target cost functional $J$ ). These strategies may also be utilized for the computation of nonlinear singular vectors [74] or for continuous dominant subspace estimation [41]. Finally, it can also be combined with adaptive learning schemes [43, 45, 65] which use system measurements and/or Monte-Carlo breeding nonlinear simulations to estimate the missing fastest growing modes. Such additional information can then correct the predictor of the SVD of $\mathfrak{R}(t+\Delta t)$ in directions orthogonal to the discrete DO increments and essentially increase the subspace size, e.g. when the estimates of $\sigma_{r+1}(\mathfrak{R}(t))$ become close to these of $\sigma_{r}(\mathfrak{R}(t))$ . *

7 Conclusion

A geometric approach was developed for dynamical model-order reduction, through the analysis of the embedded geometry of the fixed rank manifold $\mathscr{M}$ . The extrinsic curvatures of matrix manifolds were studied and geodesic equations obtained. The relationships among these notions and the differential of the orthogonal projection of the original system dynamics onto the tangent spaces of the manifold were derived and linked to the DO approximation. These geometric results allowed to derive the differential of the truncated SVD interpreted as an orthogonal projection onto the fixed rank matrix manifold. The DO approximation, with its instantaneous application of the SVD truncation of the stochastic/parametric dynamics, was shown to be the natural dynamical reduced-order model that is optimal on small integration times among all other reduced-order models that evaluate the operator of the full-space dynamics exclusively onto low rank approximations. Additionally, the explicit dynamical system satisfied by the best low rank approximation was derived and used to sharpen the error analysis of the DO approximation.

The DO method was related to Riemannian matrix optimization, for which gradient descent methods were applied and shown capable of adaptively tracking the best low rank approximation of dynamic matrices. This may prove beneficial in the integration of the time stepping of the DO approximation. Such approaches, in contrast with classic numerical integrations of the governing differential equations for the DO modes and their coefficients, open new future avenues for efficient DO numerical schemes. In general, there are now many promising directions for developing new, efficient, dynamic reduced-order methods, based on the geometry and shape of the full-space dynamics. Opportunities abound over a wide range of needs and applications of uncertainty quantification and dynamical system analyses and optimization in oceanic and atmospheric sciences, thermal-fluid sciences and engineering, electrical engineering, and chemical and biological sciences and engineering.

Acknowledgments

We thank the members of the MSEAS group at MIT as well as Camille Gillot, Christophe Zhang, and Saviz Mowlavi for insightful discussions related to this topic. We are grateful to the Office of Naval Research for support under grants N00014-14-1-0725 (Bays-DA) and N00014-14-1-0476 (Science of Autonomy – LEARNS) to the Massachusetts Institute of Technology.

Appendix A Proof of Theorem 3.8

Lemma A.1.

*Let $\Omega$ be an open set over which the projection $\Pi_{\mathscr{M}}$ is uniquely defined by eqn. Eq. 24 and such that condition Eq. 25 holds. Then $\Pi_{\mathscr{M}}$ is continuous on $\Omega$ . *

Proof A.2.

Consider a sequence $\mathfrak{R}_{n}$ converging in $E$ to $\mathfrak{R}$ and denote $\Pi_{\mathscr{M}}(\mathfrak{R}_{n})$ the corresponding projections. Let $\epsilon>0$ be a real such that $\forall n\geq 0,||\mathfrak{R}_{n}-\mathfrak{R}||<\epsilon$ . Since

[TABLE]

*the sequence $\Pi_{\mathscr{M}}(\mathfrak{R}_{n})$ is bounded. Denote $R\in\overline{\mathscr{M}}$ a limit point of this sequence. Passing to the limit the inequality $||\mathfrak{R}_{n}-\Pi_{\mathscr{M}}(\mathfrak{R}_{n})||\leq||\mathfrak{R}_{n}-\Pi_{\mathscr{M}}(\mathfrak{R})||$ , one obtains $||\mathfrak{R}-R||\leq||\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})||$ . The uniqueness of the projection, and the fact that there is no $R\in\overline{\mathscr{M}}\backslash\mathscr{M}$ satisfying this inequality, shows that $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ . Since the bounded sequence $(\Pi_{\mathscr{M}}(\mathfrak{R}_{n}))$ has a unique limit point, one deduces the convergence $\Pi_{\mathscr{M}}(\mathfrak{R}_{n})\rightarrow\Pi_{\mathscr{M}}(\mathfrak{R})$ and hence the continuity of the projection map at $\mathfrak{R}$ . *

Lemma A.3.

*At any point $\mathfrak{R}\in\Omega$ , any principal curvature $\kappa_{i}(N)$ in the direction $N$ at $\Pi_{\mathscr{M}}(\mathfrak{R})$ must satisfy $\kappa_{i}(N)<1$ . *

Proof A.4.

It is shown in Proposition 6.2 that the covariant Hessian of the distance function $J(R)=\frac{1}{2}||\mathfrak{R}-J||^{2}$ at $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ is given by

[TABLE]

where $N$ is the normal direction $N=\mathfrak{R}-\Pi_{\mathscr{M}}(\mathfrak{R})$ . Since $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ must be a local minimum of $J$ , this Hessian must be positive, namely any eigenvalue $\kappa_{i}(N)$ of the Weingarten map $L_{R}(N)$ must satisfy $1-\kappa_{i}(N)\geq 0$ . Now, consider $s>1$ such that $R+sN\in\Omega$ and notice that $||R+sN-\Pi_{\mathscr{M}}(\mathfrak{R})||=s||N||$ . Since

[TABLE]

*the uniqueness of the projection in $\Omega$ implies that $\Pi_{\mathscr{M}}(R+sN)=R$ (i.e. the projection is invariant along orthogonal rays). The linearity of the Weingarten map in $N$ implies $\kappa_{i}(sN)=s\kappa_{i}(N)$ , hence $\kappa_{i}(N)\leq\frac{1}{s}<1$ , which concludes the proof. *

Proof A.5 (Proof of Theorem 3.8).

Consider the function $f(\mathfrak{R},R)=\Pi_{\mathcal{T}(R)}(R-\mathfrak{R})$ defined on $\mathscr{M}\times E$ . The differential of $f$ with respect to the variable $R$ in a direction $X\in{\mathcal{T}(R)}$ at $R=\Pi_{\mathscr{M}}(\mathfrak{R})$ is the application

[TABLE]

Lemma A.3* implies that the Jacobian $\partial_{R,X}f$ has no zero eigenvalue and hence is invertible. The implicit function theorem ensures the existence of a diffeomorphism $\phi$ mapping an open neighborhood $\Omega_{E}\subset E$ of $\mathfrak{R}$ to an open neighborhood $\Omega_{\mathscr{M}}\subset\mathscr{M}$ of $R$ , such that for any $\mathfrak{R}^{\prime}\in\Omega_{E}$ , $\phi(\mathfrak{R}^{\prime})$ is the unique element of $\Omega_{\mathscr{M}}$ satisfying $f(\mathfrak{R}^{\prime},\phi(\mathfrak{R}^{\prime}))=0$ . By continuity of the projection (Lemma A.1), one can assume, by replacing $\Omega_{E}$ with the open subset $\Omega_{E}\cap\Pi_{\mathscr{M}}^{-1}(\Omega_{\mathscr{M}})$ , that $\Pi_{\mathscr{M}}(\Omega_{E})\subset\Omega_{\mathscr{M}}$ . Then, the equality $f(\mathfrak{R}^{\prime},\Pi_{\mathscr{M}}(\mathfrak{R}^{\prime}))=0$ implies by uniqueness: $\phi(\mathfrak{R}^{\prime})=\Pi_{\mathscr{M}}(\mathfrak{R}^{\prime})$ . Hence $\Pi_{\mathscr{M}}=\phi$ on $\Omega_{E}$ , and, in particular, $\Pi_{\mathscr{M}}$ is differentiable. Finally, for a given $X\in E$ , one can now solve Eq. 20 by projection onto the eigenvectors of $L_{R}(N)$ and obtain Eq. 26. *

Appendix B Proof of Theorem 5.7

Lemma B.1.

For any $\mathfrak{R}\in\mathcal{M}_{l,m}$ satisfying $\sigma_{r}(\mathfrak{R})>\sigma_{r+1}(\mathfrak{R})$ and $\mathfrak{X}\in\mathcal{M}_{l,m}$ :

[TABLE]

Proof B.2.

This is a consequence of the fact that the maximum eigenvalue in the decomposition Eq. 26 is

[TABLE]

The following lemma can be found in [77] and Theorem 2.6.1 in [27].

Lemma B.3.

For any points $R^{1},R^{2}\in\mathscr{M}$ the following estimate holds:

[TABLE]

where the norm of the left-hand side is the operator norm.

Remark B.4.

*This result from [77] enhances the “curvature estimates” of Lemma 4.2. of [37] that allows to have a global bound and hence avoids the smallness assumption of the initial truncation error. Note that such a bound always exists at every points of smooth manifolds (Definition 2.17 of [21]). A purely geometric analysis (Lemma 3.1. in [21]) may also be used to yield locally a sharper bound than Eq. 54 but with a larger constant $5/2$ instead of $2$ as a global estimate. *

Proof B.5 (Proof of Theorem 5.7).

Denote $R^{*}(t)=\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ . Since $\dot{R}^{*}(t)=\mathrm{D}_{\dot{\mathfrak{R}}}\Pi_{\mathscr{M}}(\mathfrak{R}(t))$ , bounding Eq. 20 and using Eq. 2 and Lemma B.1 yields:

[TABLE]

Furthermore, by triangle inequality,

[TABLE]

The Lemma B.3 (first eqn.) and Lipschitz continuity of $\mathcal{L}$ (last two eqs.) then imply

[TABLE]

Finally, the following inequality is derived, combining all above equations together:

[TABLE]

*An application of Gronwall’s Lemma (see corollary 4.3. in [29]) yields Eq. 45. *

Bibliography82

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Abatzoglou , The metric projection on C 2 manifolds in banach spaces , Journal of Approximation Theory, 26 (1979), pp. 204–211.
2[2] P. Absil, J. Trumpf, R. Mahony, and B. Andrews , All roads lead to Newton: Feasible second-order methods for equality-constrained optimization , tech. report, Citeseer, 2009.
3[3] P.-A. Absil, R. Mahony, and R. Sepulchre , Optimization algorithms on matrix manifolds , Princeton University Press, 2009.
4[4] P.-A. Absil, R. Mahony, and J. Trumpf , An extrinsic look at the riemannian hessian , in Geometric Science of Information, Springer, 2013, pp. 361–368.
5[5] P.-A. Absil and J. Malick , Projection-like retractions on matrix manifolds , SIAM Journal on Optimization, 22 (2012), pp. 135–158.
6[6] G. Allaire, C. Dapogny, G. Delgado, and G. Michailidis , Multi-phase structural optimization via a level set method , ESAIM. Control, Optimisation and Calculus of Variations, 20 (2014), p. 576.
7[7] L. Ambrosio , Geometric evolution problems, distance function and viscosity solutions , Springer, 2000.
8[8] A. Bartel, M. Clemens, M. Günther, and E. J. W. ter Maten , Scientific Computing in Electrical Engineering , Springer International Publishing, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A geometric approach to

Abstract

keywords:

1 Introduction

Notations

2 Riemannian set up: parameterizations, tangent-space, geodesics

Definition 2.1**.**

Remark 2.2**.**

Lemma 2.3**.**

Proposition 2.4**.**

Proof 2.5**.**

Remark 2.6**.**

Definition 2.7**.**

Proposition 2.8**.**

Proof 2.9**.**

Definition 2.10**.**

Proposition 2.11**.**

Proof 2.12**.**

Proposition 2.13**.**

Proof 2.14**.**

Remark 2.15**.**

Theorem 2.16**.**

Proof 2.17**.**

Remark 2.18**.**

Definition 2.19**.**

3 Curvature and differentiability of the orthogonal projection onto smooth

Definition 3.1**.**

Proposition 3.2**.**

Proof 3.3**.**

Proposition 3.4**.**

Proof 3.5**.**

Definition 3.6** (Weingarten map).**

Proof 3.7**.**

Theorem 3.8**.**

Proof 3.9**.**

4 Curvature of the fixed rank matrix manifold and the differentiability of the SVD truncation

Definition 4.1**.**

Remark 4.2**.**

Proposition 4.3**.**

Proof 4.4**.**

Remark 4.5**.**

Theorem 4.6**.**

Proof 4.7**.**

Theorem 4.8**.**

Proof 4.9**.**

Remark 4.10**.**

5 The Dynamically Orthogonal Approximation

Definition 5.1**.**

Remark 5.2**.**

Remark 5.3**.**

5.1 The DO system applies instantaneously the truncated SVD

Theorem 5.4**.**

Proof 5.5**.**

Remark 5.6**.**

5.2 The DO approximation is close to the dynamics of the best low rank

Theorem 5.7**.**

Proof 5.8**.**

Condition

Condition 2

Remark 5.9**.**

6 Optimization on the fixed rank matrix manifold for tracking the best low rank

Definition 6.1**.**

Proposition 6.2**.**

Proposition 6.3**.**

Proof 6.4**.**

Remark 6.5**.**

Remark 6.6**.**

7 Conclusion

Acknowledgments

Appendix A Proof of Theorem 3.8

Lemma A.1**.**

Proof A.2**.**

Lemma A.3**.**

Proof A.4**.**

Definition 2.1.

Remark 2.2.

Lemma 2.3.

Proposition 2.4.

Proof 2.5.

Remark 2.6.

Definition 2.7.

Proposition 2.8.

Proof 2.9.

Definition 2.10.

Proposition 2.11.

Proof 2.12.

Proposition 2.13.

Proof 2.14.

Remark 2.15.

Theorem 2.16.

Proof 2.17.

Remark 2.18.

Definition 2.19.

Definition 3.1.

Proposition 3.2.

Proof 3.3.

Proposition 3.4.

Proof 3.5.

Definition 3.6 (Weingarten map).

Proof 3.7.

Theorem 3.8.

Proof 3.9.

Definition 4.1.

Remark 4.2.

Proposition 4.3.

Proof 4.4.

Remark 4.5.

Theorem 4.6.

Proof 4.7.

Theorem 4.8.

Proof 4.9.

Remark 4.10.

Definition 5.1.

Remark 5.2.

Remark 5.3.

Theorem 5.4.

Proof 5.5.

Remark 5.6.

Theorem 5.7.

Proof 5.8.

Remark 5.9.

Definition 6.1.

Proposition 6.2.

Proposition 6.3.

Proof 6.4.

Remark 6.5.

Remark 6.6.

Lemma A.1.

Proof A.2.

Lemma A.3.

Proof A.4.

Proof A.5 (Proof of Theorem 3.8).

Lemma B.1.

Proof B.2.

Lemma B.3.

Remark B.4.

Proof B.5 (Proof of Theorem 5.7).