Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout
Takuya Hiraoka, Takashi Onishi, Yoshimasa Tsuruoka

TL;DR
This paper introduces PI+ToD, an efficient method for estimating the influence of experiences in reinforcement learning by combining policy iteration with turn-over dropout, demonstrated in MuJoCo environments.
Contribution
The paper proposes a novel approach, PI+ToD, that efficiently estimates experience influence in RL using turn-over dropout within policy iteration, reducing computational costs.
Findings
PI+ToD accurately estimates experience influence.
The method is computationally efficient in large experience buffers.
Effective in MuJoCo reinforcement learning environments.
Abstract
In reinforcement learning (RL) with experience replay, experiences stored in a replay buffer influence the RL agent's performance. Information about the influence is valuable for various purposes, including experience cleansing and analysis. One method for estimating the influence of individual experiences is agent comparison, but it is prohibitively expensive when there is a large number of experiences. In this paper, we present PI+ToD as a method for efficiently estimating the influence of experiences. PI+ToD is a policy iteration that efficiently estimates the influence of experiences by utilizing turn-over dropout. We demonstrate the efficiency of PI+ToD with experiments in MuJoCo environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Mental Health Research Topics
