Improving Generalization in Mountain Car Through the Partitioned   Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

Caleb M. Bowyer

arXiv:2105.13986·cs.LG·May 31, 2021·1 cites

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

Caleb M. Bowyer

PDF

Open Access

TL;DR

This paper introduces a partitioned parameterized policy approach optimized with quasi-Stochastic Gradient Descent for the Mountain Car problem, significantly improving policy generalization and avoiding local traps.

Contribution

It proposes a novel partitioned policy method that learns region-specific parameters, enhancing generalization over traditional uniform policies in reinforcement learning.

Findings

01

Partitioned policy outperforms uniform policy in Mountain Car

02

Method reduces circular trajectory trapping

03

Achieves better generalization in policy learning

Abstract

The reinforcement learning problem of finding a control policy that minimizes the minimum time objective for the Mountain Car environment is considered. Particularly, a class of parameterized nonlinear feedback policies is optimized over to reach the top of the highest mountain peak in minimum time. The optimization is carried out using quasi-Stochastic Gradient Descent (qSGD) methods. In attempting to find the optimal minimum time policy, a new parameterized policy approach is considered that seeks to learn an optimal policy parameter for different regions of the state space, rather than rely on a single macroscopic policy parameter for the entire state space. This partitioned parameterized policy approach is shown to outperform the uniform parameterized policy approach and lead to greater generalization than prior methods, where the Mountain Car became trapped in circular trajectories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research