How to Discount Deep Reinforcement Learning: Towards New Dynamic   Strategies

Vincent Fran\c{c}ois-Lavet; Raphael Fonteneau; Damien Ernst

arXiv:1512.02011·cs.LG·January 21, 2016·80 cites

How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

Vincent Fran\c{c}ois-Lavet, Raphael Fonteneau, Damien Ernst

PDF

Open Access

TL;DR

This paper explores dynamic discounting strategies in deep reinforcement learning, demonstrating that gradually increasing the discount factor and combining it with a variable learning rate can improve learning efficiency and stability.

Contribution

It introduces a novel approach of progressively increasing the discount factor in deep Q-networks, showing improved learning speed and stability over traditional fixed discounting methods.

Findings

01

Gradually increasing the discount factor reduces learning steps.

02

Combining variable discounting with a changing learning rate outperforms standard DQN.

03

Dynamic discounting can lead to local optima and affects exploration/exploitation balance.

Abstract

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Evolutionary Algorithms and Applications

MethodsQ-Learning · Dense Connections · Convolution · Deep Q-Network