Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning
Federico Malato, Ville Hautamaki

TL;DR
This paper introduces Adversarial Estimates, a novel method that uses latent similarity search on limited human data to significantly improve sample efficiency in deep reinforcement learning, especially in environments with sparse rewards.
Contribution
The paper proposes a new approach using Adversarial Estimates to enhance sample efficiency in feedback-based DRL algorithms with minimal human data.
Findings
Algorithms with Adversarial Estimates converge faster.
The approach enables learning in environments with sparse rewards.
Uses only five minutes of human-recorded experience.
Abstract
Sample inefficiency is a long-lasting challenge in deep reinforcement learning (DRL). Despite dramatic improvements have been made, the problem is far from being solved and is especially challenging in environments with sparse or delayed rewards. In our work, we propose to use Adversarial Estimates as a new, simple and efficient approach to mitigate this problem for a class of feedback-based DRL algorithms. Our approach leverages latent similarity search from a small set of human-collected trajectories to boost learning, using only five minutes of human-recorded experience. The results of our study show algorithms trained with Adversarial Estimates converge faster than their original version. Moreover, we discuss how our approach could enable learning in feedback-based algorithms in extreme scenarios with very sparse rewards.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInsect and Pesticide Research · Viral Infectious Diseases and Gene Expression in Insects
