Faster Reinforcement Learning by Freezing Slow States
Yijia Wang, Daniel R. Jiang

TL;DR
This paper introduces a freezing approach for slow states in MDPs with fast-slow structures, reducing computational complexity while maintaining high-quality policies, supported by theoretical analysis and empirical benchmarks.
Contribution
The paper proposes a novel frozen-state approximation method for fast-slow MDPs, enabling efficient planning by decoupling slow and fast state dynamics.
Findings
Significantly reduces computation time in benchmark problems.
Maintains high policy quality comparable to full-state methods.
Omitting slow states without freezing leads to poorer performance.
Abstract
We study infinite horizon Markov decision processes (MDPs) with "fast-slow" structure, where some state variables evolve rapidly ("fast states") while others change more gradually ("slow states"). This structure commonly arises in practice when decisions must be made at high frequencies over long horizons, and where slowly changing information still plays a critical role in determining optimal actions. Examples include inventory control under slowly changing demand indicators or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that "freezes" slow states during phases of lower-level planning and subsequently applies value iteration to an auxiliary upper-level MDP that evolves on a slower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization
