Concurrent Credit Assignment for Data-efficient Reinforcement Learning
Emmanuel Dauc\'e

TL;DR
This paper introduces a novel reinforcement learning approach that uses a variational occupancy model to improve exploration efficiency, leading to faster training and higher returns in continuous action tasks.
Contribution
It proposes a concurrent credit assignment method leveraging a variational occupancy model to enhance data efficiency in reinforcement learning.
Findings
Significant reduction in training time.
Higher returns in continuous control benchmarks.
Effective in both dense and sparse reward settings.
Abstract
The capability to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms. The variational optimization principles exposed in this paper emphasize the importance of an occupancy model to synthesizes the general distribution of the agent's environmental states over which it can act (defining a virtual ``territory''). The occupancy model is the subject of frequent updates as the exploration progresses and that new states are undisclosed during the course of the training. By making a uniform prior assumption, the resulting objective expresses a balance between two concurrent tendencies, namely the widening of the occupancy space and the maximization of the rewards, reminding of the classical exploration/exploitation trade-off. Implemented on an actor-critic off-policy on classic continuous action benchmarks, it is shown to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Auction Theory and Applications · Adaptive Dynamic Programming Control
