A Policy Gradient-Based Sequence-to-Sequence Method for Time Series Prediction
Qi Sima, Xinze Zhang, Yukun Bao, Siyue Yang, Liang Shen

TL;DR
This paper introduces a reinforcement learning approach using policy gradients to improve sequence-to-sequence models for time series prediction, addressing exposure bias and error accumulation issues.
Contribution
It proposes a novel training paradigm that adaptively selects decoder inputs via a policy network, enhancing forecasting accuracy and stability.
Findings
Improved multi-step forecasting accuracy over traditional methods
Enhanced stability in long-term predictions
Effective input selection strategy demonstrated on diverse datasets
Abstract
Sequence-to-sequence architectures built upon recurrent neural networks have become a standard choice for multi-step-ahead time series prediction. In these models, the decoder produces future values conditioned on contextual inputs, typically either actual historical observations (ground truth) or previously generated predictions. During training, feeding ground-truth values helps stabilize learning but creates a mismatch between training and inference conditions, known as exposure bias, since such true values are inaccessible during real-world deployment. On the other hand, using the model's own outputs as inputs at test time often causes errors to compound rapidly across prediction steps. To mitigate these limitations, we introduce a new training paradigm grounded in reinforcement learning: a policy gradient-based method to learn an adaptive input selection strategy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Neural Networks and Applications
