TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning
Gaurav Chaudhary, Laxmidhar Behera

TL;DR
TEACH introduces a goal curriculum driven by temporal variance in Q-values, dynamically focusing on uncertain goals to improve sample efficiency and learning speed in multi-goal reinforcement learning tasks.
Contribution
The paper proposes a novel Student-Teacher framework with a temporal variance-driven curriculum, providing theoretical insights and demonstrating improved performance across diverse tasks.
Findings
Consistent performance improvements over state-of-the-art methods
Effective goal prioritization based on temporal variance of Q-values
Seamless integration with existing RL algorithms
Abstract
Reinforcement Learning (RL) has achieved significant success in solving single-goal tasks. However, uniform goal selection often results in sample inefficiency in multi-goal settings where agents must learn a universal goal-conditioned policy. Inspired by the adaptive and structured learning processes observed in biological systems, we propose a novel Student-Teacher learning paradigm with a Temporal Variance-Driven Curriculum to accelerate Goal-Conditioned RL. In this framework, the teacher module dynamically prioritizes goals with the highest temporal variance in the policy's confidence score, parameterized by the state-action value (Q) function. The teacher provides an adaptive and focused learning signal by targeting these high-uncertainty goals, fostering continual and efficient progress. We establish a theoretical connection between the temporal variance of Q-values and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Evolutionary Algorithms and Applications
