Forecasting high-dimensional dynamics exploiting suboptimal embeddings

Shunya Okuno; Kazuyuki Aihara; Yoshito Hirata

arXiv:1907.01552·stat.ML·July 4, 2019

Forecasting high-dimensional dynamics exploiting suboptimal embeddings

Shunya Okuno, Kazuyuki Aihara, Yoshito Hirata

PDF

Open Access

TL;DR

This paper introduces a novel forecasting framework that leverages suboptimal embeddings optimized through combinatorial methods, outperforming existing approaches in high-dimensional nonlinear time series prediction across diverse datasets.

Contribution

The paper presents a new forecasting method using suboptimal embeddings optimized by combinatorial algorithms, improving prediction accuracy over traditional embedding selection techniques.

Findings

01

Outperforms existing frameworks on toy and real-world datasets.

02

Applicable to various data lengths and dimensions.

03

Effective across multiple fields like neuroscience, ecology, and finance.

Abstract

Delay embedding---a method for reconstructing dynamical systems by delay coordinates---is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. Herein, we develop a forecasting framework that overcomes these existing problems. The framework exploits various "suboptimal embeddings" obtained by minimizing the in-sample error via combinatorial optimization. The framework achieves the best results among existing frameworks for sample toy datasets and a real-world flood dataset. We show that the framework is…

Equations10

\frac{d x _{i}}{d t} = x_{i - 1} (x_{i + 1} - x_{i - 2}) - x_{i} + F, i = 0, 1, 2, ..., 9,

\frac{d x _{i}}{d t} = x_{i - 1} (x_{i + 1} - x_{i - 2}) - x_{i} + F, i = 0, 1, 2, ..., 9,

\frac{\partial y}{\partial t} = - \frac{\partial ^{2} y}{\partial x ^{2}} - \frac{\partial ^{4} y}{\partial x ^{4}} - u \frac{\partial y}{\partial x} .

\frac{\partial y}{\partial t} = - \frac{\partial ^{2} y}{\partial x ^{2}} - \frac{\partial ^{4} y}{\partial x ^{4}} - u \frac{\partial y}{\partial x} .

\frac{d x}{d t} = p (y - x), \frac{d y}{d t} = x (r - z) - y, \frac{d z}{d t} = x y - b z .

\frac{d x}{d t} = p (y - x), \frac{d y}{d t} = x (r - z) - y, \frac{d z}{d t} = x y - b z .

\frac{d x}{d t} = - y - z, \frac{d y}{d t} = x + a y, \frac{d z}{d t} = b + z (x - c) .

\frac{d x}{d t} = - y - z, \frac{d y}{d t} = x + a y, \frac{d z}{d t} = b + z (x - c) .

\overset{v}{^} (t + p ∣ t) = t^{'} \in I (t) \sum λ (t^{'}) v (t^{'} + p),

\overset{v}{^} (t + p ∣ t) = t^{'} \in I (t) \sum λ (t^{'}) v (t^{'} + p),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate variability and models · Complex Systems and Time Series Analysis · Time Series Analysis and Forecasting

Full text

\floatsetup

[figure]style=plain,subcapbesideposition=top

Supplementary information for “Forecasting high-dimensional dynamics exploiting suboptimal embeddings”

Shunya Okuno

Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan

Disaster Reduction & Environmental Engineering Department, Kozo Keikaku Engineering Inc., 4-5-3 Chuo, Nakanoku, Tokyo 164-0011, Japan

[email protected]

Kazuyuki Aihara

Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan

International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

these authors contributed equally to this work

Yoshito Hirata

International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

Mathematics and Informatics Center, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

these authors contributed equally to this work

Effect of multiple optimizations

We solve multiple combinatorial optimization problems to divide the objective function into $K$ problems in the first step. This procedure yields diverse embeddings to improve the performance of the combined forecast. Here, we numerically show the effect of the multiple optimizations.

We profiled the effect of $K$ times optimization with an example of the 10-dimensional Lorenz’96I model with the data length of 4000. We forecasted the data by two methods: one is the proposed $K$ optimizations, and the other is a single optimization to minimize the whole training in-sample error. All other processes follow the proposed forecast framework. Note that the computational time is almost equivalent with both methods.

As shown in Figures 1(a) and (b), the procedure with $K$ optimizations enhanced the diversity of embeddings compared with the single optimization. This result suggests that the $K$ times optimization found different but useful embeddings for each optimization process. The $K$ times optimization also reduced the number of forecasts to combine (Figures 1(c) and (d)). This is because the single optimization process yielded similar embeddings, and the combination of similar forecasts yields a small performance improvement empirically[1, 2]. Reductions in the number of combined forecasts and the computational time are important, especially for real-time applications.

Application to low-dimensional dynamics

We applied the proposed forecast framework to low-dimensional datasets—namely, the Lorenz’63 equations[3] (three variables), the Rössler equations[4] (three variables), and the six-dimensional Lorenz’96I equations[5] (six variables). We set the data length to 4000 as the database and evaluated forecasts up to 10 steps ahead with 500 samples for all the cases. See the next section for the detailed conditions.

As shown in Figure 2, the proposed framework did not always achieve the best performance. There is little difference between the proposed framework and state-dependent weighting (SDW) for the Lorenz’63 dataset (Figure 2(a)), and SDW achieved the best performance for the Rössler dataset (Figure 2(b)). In contrast, when we increased the number of variables to six, the proposed framework yielded the best performance (Figure 2(c)), and as stated in the main text, the proposed framework achieved much better results than the others for cases with 10 or 20 variables.

Interestingly, the performance of SDW was superior to that of MVE for all of the low-dimensional toy models mentioned above, although the performance of SDW for the high-dimensional dataset was not satisfactory, as stated in the main text. This result means that weighting based on the forecast performance and its corresponding state[6] worked fine with simple low-dimensional data. On the other hand, it is difficult to improve the performance by weighting for high-dimensional data, and a simple average is sufficient for such cases. In short, it is preferable to apply the proposed framework to complex high-dimensional and less noisy data, but it is worth considering the application of weighting methods such as SDW to simple low-dimensional data.

Detailed conditions of the numerical experiments

Lorenz’96I equations

The Lorenz’96I equations[5] are expressed as 10-dimensional differential equations as follows:

[TABLE]

where $F$ is a forcing variable and $i$ is cyclic. We generated 10-dimensional time series: Lorenz’96I’s $x_{0},x_{1},x_{2},x_{3}$ , and $x_{4}$ and random walks $x_{5},x_{6},x_{7},x_{8}$ , and $x_{9}$ . We chose the initial condition from a normally distributed random value for each variable. We set the integration step to 0.001 and recorded every 50 points. Note that the initial transients were disregarded. We forecasted $x_{0}$ up to 10 steps. We set $K=10,M=3,\theta=3$ , and $h_{i}(0)=1,h_{i}(1)=\rho\ \forall i$ for $\rho\in\{0.0,-0.2,-0.4,-0.6,-0.8,-1.0\}$ . For the $(\mu+\lambda)$ -ES algorithm, we set $\mu=50,\ \lambda=100$ , the number of generations to 10, and the number of populations to 100.

We compared the proposed framework with existing frameworks—namely, randomly distributed embedding with an aggregation scheme (RDE), multiview embedding (MVE), SDW with $E=4$ , and up to four lags for MVE and SDW. Because the possible number of embeddings combinatorially increases, we randomly generated 1000 embeddings instead of using brute-force calculation. Note that the total number of embedding evaluations is almost the same as that for the proposed framework. We also computed the single-best embedding via the $(\mu+\lambda)$ -ES algorithm to minimize the total in-sample error within the whole training dataset.

Kuramoto–Sivashinsky equations

The Kuramoto–Sivashinsky equations[7, 8] are expressed as follows:

[TABLE]

We computed $y(x,t)$ with spatially periodic boundary conditions in the interval $[0,L]$ , where $L=22$ and the number of uniform grids is 128. We generated 20-dimensional time series: the values of the first 10 grids of the Kuramoto–Sivashinsky equations $x_{0},x_{1},...,x_{9}$ with a sampling time of 1.0 and 10 random walks $x_{10},x_{11},...,x_{19}$ . We set $E=5$ and considered up to five lags for the existing schemes (RDE, MVE, and SDW). We forecasted $x_{0}$ up to 10 steps using the same parameter values, data length, and noise scales as those in the Lorenz’96I example.

Flood dataset

The flood forecasting competition dataset “Artificial Neural Network Experiment (ANNEX 2005/2006)”[9] contains river stage and rainfall data for three periods: 1993-10-01 to 1994-03-31, 1994-10-01 to 1995-03-31, and 1995-10-01 to 1996-03-31. We forecasted river stage $Q$ 6, 12, 18, and 24 h ahead using nine variables: the river stages of the target ( $Q$ ) and three upstream sites ( $US1$ , $US2$ , and $US3$ ) and five rain gauges ( $RG1$ , $RG2$ , $RG3$ , $RG4$ , and $RG5$ ). All data were sampled by 6 h. We used the period of 1994-10-01 to 1995-03-31 for testing and the others for training. We set $K=6,M=3,\theta=3$ , and $h_{i}(0)=1,h_{i}(1)=\rho\ \forall i$ for $\rho\in\{0.0,-1.0\}$ . For the $(\mu+\lambda)$ -ES algorithm, we set $\mu=50,\lambda=100$ , the number of generations to 20, and the number of populations to 100.

Lorenz’63 equations

The Lorenz’63 equations[3] are expressed as three-dimensional differential equations as follows:

[TABLE]

We generated three-dimensional time series: Lorenz’63’s $x$ and $y$ and a random-walk series. We set the integration step to 0.001 and recorded every 100 points with $p=10,b=8/3,$ and $r=28$ . Note that the initial transients were disregarded. We forecasted $x$ up to 10 steps. We set $K=10,M=3,\theta=3$ , and $h_{i}(0)=1,h_{i}(1)=\rho\ \forall i$ for $\rho\in\{0.0,-0.2,-0.4,-0.6,-0.8,-1.0\}$ . For the $(\mu+\lambda)$ -ES algorithm, we set $\mu=50,\lambda=100$ , the number of generations to 10, and the number of populations to 100.

We compared the proposed framework with MVE and SDW with $E=4$ up to five lags and the single-best embedding via the $(\mu+\lambda)$ -ES algorithm to minimize the total in-sample error within the whole training dataset. Note that we did not carry out calculations with RDE because we were not able to prepare a sufficient number of “nondelay embeddings” for this type of low-dimensional data.

Rössler equations

The Rössler equations[4] are expressed as three-dimensional differential equations as follows:

[TABLE]

We generated three-dimensional time series: Rössler $x$ and $y$ and a random-walk series. We set the integration step to 0.001 and recorded every 500 points with $a=0.36,b=0.4$ , and $c=4.5$ . Note that the initial transients were disregarded. We forecasted $x$ up to 10 steps with the same conditions as those of the Lorenz’63 equations.

Six-dimensional Lorenz’96I equations

The six-dimensional Lorenz’96I dataset contains the six-dimensional Lorenz’96I series $x_{0},x_{1}$ , and $x_{2}$ and random walks $x_{3},x_{4}$ , and $x_{5}$ . The conditions for numerical integration were the same as those of the 10-dimensional Lorenz’96I equations. We forecasted $x_{0}$ up to 10 steps with the same conditions as those of the Lorenz’63 equations.

Method of analogues

We applied a variation of the method of analogues[10] to obtain a $p$ -steps-ahead forecast $\hat{y}_{f}(t+p|t)$ at time $t$ . The method of analogues takes the forward paths of neighboring trajectories as the forecast. Here, we forecast $y_{f}(t+p|t)$ from the set of delay coordinates $\{v(t)\mid t\in\mathcal{T}_{train}\}$ . We first search the database to find the neighboring points of $v(t)$ . Then, the method of analogues gives $\hat{v}(t+p|t)$ in terms of the set of neighboring time indices $\mathcal{I}(t)$ as follows:

[TABLE]

where $\lambda(t^{\prime})\in\mathbb{R}_{+}$ is a weight satisfying $\sum_{t^{\prime}\in\mathcal{I}(t)}\lambda(t)$ . We employ $\lambda(t^{\prime})\propto\|v(t^{\prime})-v(t)\|^{2}_{2}$ throughout this paper.

Note that we can apply more advanced methods such as that in Refs. 11, 12 or any other regression methods with our proposed forecasting framework.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Sollich, P. & Krogh’", A. Learning with ensembles: how over-fitting can be useful. In Advances in neural information processing systems , 190–196 (1996).
2[2] Kuncheva, L. I. & Whitaker, C. J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. \Journal Title Machine Learning 51 , 181–207, DOI: 10.1023/A:1022859003006 (2003).
3[3] Lorenz, E. N. Deterministic nonperiodic flow. \Journal Title Journal of the Atmospheric Sciences 20 , 130–141 (1963).
4[4] Rössler, O. E. An equation for continuous chaos. \Journal Title Physics Letters A 57 , 397–398 (1976).
5[5] Lorenz, E. N. Predictability: a problem partly solved. In Seminar on Predictability , 1–18 (ECMWF, Reading, England, 1996).
6[6] Okuno, S., Aihara, K. & Hirata, Y. Combining multiple forecasts for multivariate time series via state-dependent weighting. \Journal Title Chaos: An Interdisciplinary Journal of Nonlinear Science 29 , 33128, DOI: 10.1063/1.5057379 (2019).
7[7] Kuramoto, Y. & Tsuzuki, T. Persistent Propagation of Concentration Waves in Dissipative Media Far from Thermal Equilibrium. \Journal Title Progress of Theoretical Physics 55 , 356–369, DOI: 10.1143/PTP.55.356 (1976).
8[8] Sivashinsky, G. I. Nonlinear analysis of hydrodynamic instability in laminar flames-I. Derivation of basic equations. \Journal Title Acta Astronautica 4 , 1177–1206, DOI: 10.1016/0094-5765(77)90096-0 (1977).