Influence of database noises to machine learning for spatiotemporal chaos
Yu Yang, Shijie Qin, Shijun Liao

TL;DR
This paper demonstrates that the accuracy of machine learning predictions for spatiotemporal chaos heavily depends on the quality of the training data, emphasizing the importance of noise-free datasets obtained via the CNS strategy.
Contribution
It reveals the significant impact of database noise on ML predictions of chaos and highlights the necessity of using clean data for reliable results, introducing a new perspective in ML research.
Findings
ML trained on clean data yields more accurate predictions.
Noisy data significantly degrades ML performance.
Clean numerical simulation (CNS) provides high-quality benchmark data.
Abstract
A new strategy, namely the "clean numerical simulation" (CNS), was proposed (J. Computational Physics, 418:109629, 2020) to gain reliable/convergent simulations (with negligible numerical noises) of spatiotemporal chaotic systems in a long enough interval of time, which provide us benchmark solution for comparison. Here we illustrate that machine learning (ML) can always give good enough fitting predictions of a spatiotemporal chaos by using, separately, two quite different training sets: one is the "clean database" given by the CNS with negligible numerical noises, the other is the "polluted database" given by the traditional algorithms in single/double precision with considerably large numerical noises. However, even in statistics, the ML predictions based on the "polluted database" are quite different from those based on the "clean database". It illustrates that the database noises…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research and Discoveries · Astronomy and Astrophysical Research · Advanced Data Storage Technologies
