EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL
Thomas Carta, Pierre-Yves Oudeyer, Olivier Sigaud, Sylvain, Lamprier

TL;DR
This paper introduces EAGER, a method that uses question generation and answering to automatically shape rewards in language-guided reinforcement learning, improving sample efficiency without manual auxiliary design.
Contribution
The paper presents an automated reward shaping technique leveraging QG and QA systems to extract auxiliary objectives from language goals in RL.
Findings
Improves sample efficiency in language-conditioned RL tasks.
Does not require manual design of auxiliary objectives.
Enhances exploration by guiding the agent with intrinsic rewards.
Abstract
Reinforcement learning (RL) in long horizon and sparse reward tasks is notoriously difficult and requires a lot of training steps. A standard solution to speed up the process is to leverage additional reward signals, shaping it to better guide the learning process. In the context of language-conditioned RL, the abstraction and generalisation properties of the language input provide opportunities for more efficient ways of shaping the reward. In this paper, we leverage this idea and propose an automated reward shaping method where the agent extracts auxiliary objectives from the general language goal. These auxiliary objectives use a question generation (QG) and question answering (QA) system: they consist of questions leading the agent to try to reconstruct partial information about the global goal using its own trajectory. When it succeeds, it receives an intrinsic reward proportional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
