TL;DR
This paper introduces a student-initiated action advising method in deep reinforcement learning that uses Random Network Distillation to measure advice novelty, addressing limitations of existing approaches and improving learning efficiency.
Contribution
The proposed algorithm employs RND for advice novelty and updates only advised states, enhancing robustness over existing state novelty-based methods.
Findings
Performs comparably to state-of-the-art methods in standard scenarios.
Shows significant advantages in scenarios where existing methods fail.
Effectively mitigates feedback lag issues in advice timing.
Abstract
Action advising is a budget-constrained knowledge exchange mechanism between teacher-student peers that can help tackle exploration and sample inefficiency problems in deep reinforcement learning (RL). Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, the approaches built on these estimations have some potential weaknesses. First, they assume that the convergence of the student's RL model implies less need for advice. This can be misleading in scenarios with teacher absence early on where the student is likely to learn suboptimally by itself; yet also ignore the teacher's assistance later. Secondly, the delays between encountering states and having them to take effect in the RL model updates in presence of the experience replay dynamics cause a feedback lag in what the student actually needs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsExperience Replay
