Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information
Ziyi Zhang, Yorie Nakahira, Guannan Qu

TL;DR
This paper introduces a look-ahead prediction-based algorithm for non-stationary MDPs that achieves low regret and adapts to prediction errors, with theoretical guarantees and simulation validation.
Contribution
It develops a novel algorithm leveraging look-ahead predictions for non-stationary MDPs, providing regret bounds that decay exponentially with look-ahead size.
Findings
Regret decreases exponentially with look-ahead window size.
Algorithm remains robust even with sub-exponential prediction errors.
Simulations confirm effectiveness in non-stationary environments.
Abstract
Policy design in non-stationary Markov Decision Processes (MDPs) is inherently challenging due to the complexities introduced by time-varying system transition and reward, which make it difficult for learners to determine the optimal actions for maximizing cumulative future rewards. Fortunately, in many practical applications, such as energy systems, look-ahead predictions are available, including forecasts for renewable energy generation and demand. In this paper, we leverage these look-ahead predictions and propose an algorithm designed to achieve low regret in non-stationary MDPs by incorporating such predictions. Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Algorithms and Applications · Industrial Technology and Control Systems
