Forecasting high-dimensional dynamics exploiting suboptimal embeddings
Shunya Okuno, Kazuyuki Aihara, Yoshito Hirata

TL;DR
This paper introduces a novel forecasting framework that leverages suboptimal embeddings optimized through combinatorial methods, outperforming existing approaches in high-dimensional nonlinear time series prediction across diverse datasets.
Contribution
The paper presents a new forecasting method using suboptimal embeddings optimized by combinatorial algorithms, improving prediction accuracy over traditional embedding selection techniques.
Findings
Outperforms existing frameworks on toy and real-world datasets.
Applicable to various data lengths and dimensions.
Effective across multiple fields like neuroscience, ecology, and finance.
Abstract
Delay embedding---a method for reconstructing dynamical systems by delay coordinates---is widely used to forecast nonlinear time series as a model-free approach. When multivariate time series are observed, several existing frameworks can be applied to yield a single forecast combining multiple forecasts derived from various embeddings. However, the performance of these frameworks is not always satisfactory because they randomly select embeddings or use brute force and do not consider the diversity of the embeddings to combine. Herein, we develop a forecasting framework that overcomes these existing problems. The framework exploits various "suboptimal embeddings" obtained by minimizing the in-sample error via combinatorial optimization. The framework achieves the best results among existing frameworks for sample toy datasets and a real-world flood dataset. We show that the framework is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate variability and models · Complex Systems and Time Series Analysis · Time Series Analysis and Forecasting
\floatsetup
[figure]style=plain,subcapbesideposition=top
Supplementary information for “Forecasting high-dimensional dynamics exploiting suboptimal embeddings”
Shunya Okuno
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan
Disaster Reduction & Environmental Engineering Department, Kozo Keikaku Engineering Inc., 4-5-3 Chuo, Nakanoku, Tokyo 164-0011, Japan
Kazuyuki Aihara
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan
International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
these authors contributed equally to this work
Yoshito Hirata
International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
Mathematics and Informatics Center, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
these authors contributed equally to this work
Effect of multiple optimizations
We solve multiple combinatorial optimization problems to divide the objective function into problems in the first step. This procedure yields diverse embeddings to improve the performance of the combined forecast. Here, we numerically show the effect of the multiple optimizations.
We profiled the effect of times optimization with an example of the 10-dimensional Lorenz’96I model with the data length of 4000. We forecasted the data by two methods: one is the proposed optimizations, and the other is a single optimization to minimize the whole training in-sample error. All other processes follow the proposed forecast framework. Note that the computational time is almost equivalent with both methods.
As shown in Figures 1(a) and (b), the procedure with optimizations enhanced the diversity of embeddings compared with the single optimization. This result suggests that the times optimization found different but useful embeddings for each optimization process. The times optimization also reduced the number of forecasts to combine (Figures 1(c) and (d)). This is because the single optimization process yielded similar embeddings, and the combination of similar forecasts yields a small performance improvement empirically[1, 2]. Reductions in the number of combined forecasts and the computational time are important, especially for real-time applications.
Application to low-dimensional dynamics
We applied the proposed forecast framework to low-dimensional datasets—namely, the Lorenz’63 equations[3] (three variables), the Rössler equations[4] (three variables), and the six-dimensional Lorenz’96I equations[5] (six variables). We set the data length to 4000 as the database and evaluated forecasts up to 10 steps ahead with 500 samples for all the cases. See the next section for the detailed conditions.
As shown in Figure 2, the proposed framework did not always achieve the best performance. There is little difference between the proposed framework and state-dependent weighting (SDW) for the Lorenz’63 dataset (Figure 2(a)), and SDW achieved the best performance for the Rössler dataset (Figure 2(b)). In contrast, when we increased the number of variables to six, the proposed framework yielded the best performance (Figure 2(c)), and as stated in the main text, the proposed framework achieved much better results than the others for cases with 10 or 20 variables.
Interestingly, the performance of SDW was superior to that of MVE for all of the low-dimensional toy models mentioned above, although the performance of SDW for the high-dimensional dataset was not satisfactory, as stated in the main text. This result means that weighting based on the forecast performance and its corresponding state[6] worked fine with simple low-dimensional data. On the other hand, it is difficult to improve the performance by weighting for high-dimensional data, and a simple average is sufficient for such cases. In short, it is preferable to apply the proposed framework to complex high-dimensional and less noisy data, but it is worth considering the application of weighting methods such as SDW to simple low-dimensional data.
Detailed conditions of the numerical experiments
Lorenz’96I equations
The Lorenz’96I equations[5] are expressed as 10-dimensional differential equations as follows:
[TABLE]
where is a forcing variable and is cyclic. We generated 10-dimensional time series: Lorenz’96I’s , and and random walks , and . We chose the initial condition from a normally distributed random value for each variable. We set the integration step to 0.001 and recorded every 50 points. Note that the initial transients were disregarded. We forecasted up to 10 steps. We set , and for . For the -ES algorithm, we set , the number of generations to 10, and the number of populations to 100.
We compared the proposed framework with existing frameworks—namely, randomly distributed embedding with an aggregation scheme (RDE), multiview embedding (MVE), SDW with , and up to four lags for MVE and SDW. Because the possible number of embeddings combinatorially increases, we randomly generated 1000 embeddings instead of using brute-force calculation. Note that the total number of embedding evaluations is almost the same as that for the proposed framework. We also computed the single-best embedding via the -ES algorithm to minimize the total in-sample error within the whole training dataset.
Kuramoto–Sivashinsky equations
The Kuramoto–Sivashinsky equations[7, 8] are expressed as follows:
[TABLE]
We computed with spatially periodic boundary conditions in the interval , where and the number of uniform grids is 128. We generated 20-dimensional time series: the values of the first 10 grids of the Kuramoto–Sivashinsky equations with a sampling time of 1.0 and 10 random walks . We set and considered up to five lags for the existing schemes (RDE, MVE, and SDW). We forecasted up to 10 steps using the same parameter values, data length, and noise scales as those in the Lorenz’96I example.
Flood dataset
The flood forecasting competition dataset “Artificial Neural Network Experiment (ANNEX 2005/2006)”[9] contains river stage and rainfall data for three periods: 1993-10-01 to 1994-03-31, 1994-10-01 to 1995-03-31, and 1995-10-01 to 1996-03-31. We forecasted river stage 6, 12, 18, and 24 h ahead using nine variables: the river stages of the target () and three upstream sites (, , and ) and five rain gauges (, , , , and ). All data were sampled by 6 h. We used the period of 1994-10-01 to 1995-03-31 for testing and the others for training. We set , and for . For the -ES algorithm, we set , the number of generations to 20, and the number of populations to 100.
Lorenz’63 equations
The Lorenz’63 equations[3] are expressed as three-dimensional differential equations as follows:
[TABLE]
We generated three-dimensional time series: Lorenz’63’s and and a random-walk series. We set the integration step to 0.001 and recorded every 100 points with and . Note that the initial transients were disregarded. We forecasted up to 10 steps. We set , and for . For the -ES algorithm, we set , the number of generations to 10, and the number of populations to 100.
We compared the proposed framework with MVE and SDW with up to five lags and the single-best embedding via the -ES algorithm to minimize the total in-sample error within the whole training dataset. Note that we did not carry out calculations with RDE because we were not able to prepare a sufficient number of “nondelay embeddings” for this type of low-dimensional data.
Rössler equations
The Rössler equations[4] are expressed as three-dimensional differential equations as follows:
[TABLE]
We generated three-dimensional time series: Rössler and and a random-walk series. We set the integration step to 0.001 and recorded every 500 points with , and . Note that the initial transients were disregarded. We forecasted up to 10 steps with the same conditions as those of the Lorenz’63 equations.
Six-dimensional Lorenz’96I equations
The six-dimensional Lorenz’96I dataset contains the six-dimensional Lorenz’96I series , and and random walks , and . The conditions for numerical integration were the same as those of the 10-dimensional Lorenz’96I equations. We forecasted up to 10 steps with the same conditions as those of the Lorenz’63 equations.
Method of analogues
We applied a variation of the method of analogues[10] to obtain a -steps-ahead forecast at time . The method of analogues takes the forward paths of neighboring trajectories as the forecast. Here, we forecast from the set of delay coordinates . We first search the database to find the neighboring points of . Then, the method of analogues gives in terms of the set of neighboring time indices as follows:
[TABLE]
where is a weight satisfying . We employ throughout this paper.
Note that we can apply more advanced methods such as that in Refs. 11, 12 or any other regression methods with our proposed forecasting framework.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Sollich, P. & Krogh’", A. Learning with ensembles: how over-fitting can be useful. In Advances in neural information processing systems , 190–196 (1996).
- 2[2] Kuncheva, L. I. & Whitaker, C. J. Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. \Journal Title Machine Learning 51 , 181–207, DOI: 10.1023/A:1022859003006 (2003).
- 3[3] Lorenz, E. N. Deterministic nonperiodic flow. \Journal Title Journal of the Atmospheric Sciences 20 , 130–141 (1963).
- 4[4] Rössler, O. E. An equation for continuous chaos. \Journal Title Physics Letters A 57 , 397–398 (1976).
- 5[5] Lorenz, E. N. Predictability: a problem partly solved. In Seminar on Predictability , 1–18 (ECMWF, Reading, England, 1996).
- 6[6] Okuno, S., Aihara, K. & Hirata, Y. Combining multiple forecasts for multivariate time series via state-dependent weighting. \Journal Title Chaos: An Interdisciplinary Journal of Nonlinear Science 29 , 33128, DOI: 10.1063/1.5057379 (2019).
- 7[7] Kuramoto, Y. & Tsuzuki, T. Persistent Propagation of Concentration Waves in Dissipative Media Far from Thermal Equilibrium. \Journal Title Progress of Theoretical Physics 55 , 356–369, DOI: 10.1143/PTP.55.356 (1976).
- 8[8] Sivashinsky, G. I. Nonlinear analysis of hydrodynamic instability in laminar flames-I. Derivation of basic equations. \Journal Title Acta Astronautica 4 , 1177–1206, DOI: 10.1016/0094-5765(77)90096-0 (1977).
