PaEBack: Pareto-Efficient Backsubsampling for Time Series Data
Xinyu Zhang, Sujit Ghosh

TL;DR
This paper introduces PaEBack, a method to determine the minimal recent data needed for near-optimal forecasting accuracy in time series, supported by theoretical and numerical evidence.
Contribution
The paper proposes PaEBack, a novel approach to estimate the recent data fraction required for effective time series prediction, with theoretical justification for AR models.
Findings
A small recent data subset can achieve near-optimal prediction accuracy.
PaEBack applies effectively even with model misspecification.
The method is supported by theoretical and numerical validation.
Abstract
Time series forecasting has been a quintessential topic in data science, but traditionally, forecasting models have relied on extensive historical data. In this paper, we address a practical question: How much recent historical data is required to attain a targeted percentage of statistical prediction efficiency compared to the full time series? We propose the Pareto-Efficient Backsubsampling (PaEBack) method to estimate the percentage of the most recent data needed to achieve the desired level of prediction accuracy. We provide a theoretical justification based on asymptotic prediction theory for the AutoRegressive (AR) models. In particular, through several numerical illustrations, we show the application of the PaEBack for some recently developed machine learning forecasting methods even when the models might be misspecified. The main conclusion is that only a fraction of the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Stock Market Forecasting Methods
