The Importance of Clipping in Neurocontrol by Direct Gradient Descent on   the Cost-to-Go Function and in Adaptive Dynamic Programming

Michael Fairbank

arXiv:1302.5565·cs.LG·February 25, 2013·1 cites

The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

Michael Fairbank

PDF

Open Access

TL;DR

This paper highlights the critical role of 'clipping' in discretized-time neurocontrol and adaptive dynamic programming, showing it can significantly improve learning performance by ensuring the agent stops exactly at terminal states.

Contribution

It identifies the importance of proper clipping in gradient-based algorithms for neurocontrol, demonstrating its impact on learning effectiveness and optimality.

Findings

01

Proper clipping improves learning performance.

02

Omission of clipping can prevent reaching the optimal solution.

03

Clipping mainly affects gradient-based methods with explicit derivatives.

Abstract

In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do "clipping" on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research