Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs
Nicholas Potteiger, Ankita Samaddar, Taylor T. Johnson, Xenofon Koutsoukos

TL;DR
This paper introduces MRBT, a symbolic structure combining reward shaping and action masking using LLMs and formal verification to improve reinforcement learning for complex, compositional tasks.
Contribution
The authors develop MRBT, a modular, reactive reward and action mask framework verified with SMT solvers, enhancing RL efficiency and robustness for object-interaction tasks.
Findings
MRBTs successfully generated and refined for five different tasks.
Training efficiency and success rates improved over baselines.
MRBTs demonstrated transferability, modularity, and verifiability.
Abstract
Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking, however none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks. To overcome these challenges, we develop masking reward behavior tree (MRBT), a symbolic structure used as a reactive and modular reward and action mask function. We design an MRBT template and derive logical specifications to construct and verify MRBTs for a sequence of object-interaction subtasks. Further, we develop an automated pipeline that uses an LLM to generate MRBTs robust to varying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
