CANDERE-COACH: Reinforcement Learning from Noisy Feedback
Yuxuan Li, Srijita Das, Matthew E. Taylor

TL;DR
CANDERE-COACH introduces a reinforcement learning method that effectively learns from noisy binary feedback, filtering out errors to improve learning performance even when feedback accuracy drops to 60%.
Contribution
The paper presents a novel noise-filtering mechanism for RL from noisy binary feedback, enabling learning with up to 40% incorrect teacher signals.
Findings
Effective learning with 40% noisy feedback
Outperforms baseline methods in three domains
Demonstrates robustness to teacher feedback errors
Abstract
In recent times, Reinforcement learning (RL) has been widely applied to many challenging tasks. However, in order to perform well, it requires access to a good reward function which is often sparse or manually engineered with scope for error. Introducing human prior knowledge is often seen as a possible solution to the above-mentioned problem, such as imitation learning, learning from preference, and inverse reinforcement learning. Learning from feedback is another framework that enables an RL agent to learn from binary evaluative signals describing the teacher's (positive or negative) evaluation of the agent's action. However, these methods often make the assumption that evaluative teacher feedback is perfect, which is a restrictive assumption. In practice, such feedback can be noisy due to limited teacher expertise or other exacerbating factors like cognitive load, availability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Reinforcement Learning in Robotics
