Optimistic critics can empower small actors
Olya Mastikhina, Dhruv Sreenivas, Pablo Samuel Castro

TL;DR
This paper investigates the effects of asymmetric actor-critic architectures in deep reinforcement learning, revealing that smaller actors often lead to performance issues due to critic overfitting and value underestimation, and proposes mitigation techniques.
Contribution
It provides comprehensive empirical analysis of asymmetric actor-critic setups and introduces methods to address value underestimation, enabling more effective use of smaller actors.
Findings
Smaller actors generally cause performance degradation.
Overfit critics contribute to poor data collection.
Mitigation techniques can improve asymmetric actor-critic performance.
Abstract
Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use symmetric architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of asymmetric setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning
