Proximal Curriculum with Task Correlations for Deep Reinforcement Learning
Georgios Tzannetos, Parameswaran Kamalaruban, Adish Singla

TL;DR
This paper introduces ProCuRL-Target, a curriculum strategy for deep reinforcement learning that leverages task correlations and the Zone of Proximal Development to accelerate learning towards complex target task distributions.
Contribution
It proposes a novel curriculum method that balances task difficulty and progression using task correlations, with theoretical justification and superior empirical results.
Findings
ProCuRL-Target outperforms state-of-the-art baselines in various domains.
The curriculum accelerates training of deep RL agents on complex tasks.
Task correlation-based curriculum effectively guides learning toward target distributions.
Abstract
Curriculum design for reinforcement learning (RL) can speed up an agent's learning process and help it learn to perform well on complex tasks. However, existing techniques typically require domain-specific hyperparameter tuning, involve expensive optimization procedures for task selection, or are suitable only for specific learning objectives. In this work, we consider curriculum design in contextual multi-task settings where the agent's final performance is measured w.r.t. a target distribution over complex tasks. We base our curriculum design on the Zone of Proximal Development concept, which has proven to be effective in accelerating the learning process of RL agents for uniform distribution over all tasks. We propose a novel curriculum, ProCuRL-Target, that effectively balances the need for selecting tasks that are not too difficult for the agent while progressing the agent's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Teaching and Learning Programming
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · REINFORCE · Balanced Selection
