Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent
Caleb M. Bowyer

TL;DR
This paper introduces a partitioned parameterized policy approach optimized with quasi-Stochastic Gradient Descent for the Mountain Car problem, significantly improving policy generalization and avoiding local traps.
Contribution
It proposes a novel partitioned policy method that learns region-specific parameters, enhancing generalization over traditional uniform policies in reinforcement learning.
Findings
Partitioned policy outperforms uniform policy in Mountain Car
Method reduces circular trajectory trapping
Achieves better generalization in policy learning
Abstract
The reinforcement learning problem of finding a control policy that minimizes the minimum time objective for the Mountain Car environment is considered. Particularly, a class of parameterized nonlinear feedback policies is optimized over to reach the top of the highest mountain peak in minimum time. The optimization is carried out using quasi-Stochastic Gradient Descent (qSGD) methods. In attempting to find the optimal minimum time policy, a new parameterized policy approach is considered that seeks to learn an optimal policy parameter for different regions of the state space, rather than rely on a single macroscopic policy parameter for the entire state space. This partitioned parameterized policy approach is shown to outperform the uniform parameterized policy approach and lead to greater generalization than prior methods, where the Mountain Car became trapped in circular trajectories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
