Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning
Sayan Ghosh, Shashank Srivastava

TL;DR
This paper introduces a novel inverse reinforcement learning approach with interpretable reward components for mapping natural language instructions to programs, resulting in improved performance, data efficiency, and human preferences over existing methods.
Contribution
It proposes a joint learning framework for reward functions and program policies using multiple reward components, advancing natural language to program mapping.
Findings
Up to 9.0% improvement on LCS metric
14.7% better recall metrics
Programs preferred by humans over RL-based methods
Abstract
Mapping natural language instructions to programs that computers can process is a fundamental challenge. Existing approaches focus on likelihood-based training or using reinforcement learning to fine-tune models based on a single reward. In this paper, we pose program generation from language as Inverse Reinforcement Learning. We introduce several interpretable reward components and jointly learn (1) a reward function that linearly combines them, and (2) a policy for program generation. Fine-tuning with our approach achieves significantly better performance than competitive methods using Reinforcement Learning (RL). On the VirtualHome framework, we get improvements of up to 9.0% on the Longest Common Subsequence metric and 14.7% on recall-based metrics over previous work on this framework (Puig et al., 2018). The approach is data-efficient, showing larger gains in performance in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Reinforcement Learning in Robotics
