SCALAR: Learning and Composing Skills through LLM Guided Symbolic Planning and Deep RL Grounding
Renos Zabounidis, Yue Wu, Simon Stepputtis, Woojun Kim, Yuanzhi Li, Tom Mitchell, Katia Sycara

TL;DR
SCALAR is a bidirectional framework that combines LLM planning with reinforcement learning, using a learned skill library and feedback mechanisms to improve low-level control and robustness in complex tasks.
Contribution
It introduces a novel iterative coupling of LLM-based skill planning with RL training, including feedback and trajectory analysis for improved skill grounding.
Findings
Achieves 88.2% success in Craftax, outperforming baselines.
Reaches Gnomish Mines 9.1% of the time, where prior methods fail.
Improves robustness and sample efficiency through feedback and trajectory analysis.
Abstract
LM-based agents excel when given high-level action APIs but struggle to ground language into low-level control. Prior work has LLMs generate skills or reward functions for RL, but these one-shot approaches lack feedback to correct specification errors. We introduce SCALAR, a bidirectional framework coupling LLM planning with RL through a learned skill library. The LLM proposes skills with preconditions and effects; RL trains policies for each skill and feeds back execution results to iteratively refine specifications, improving robustness to initial errors. Pivotal Trajectory Analysis corrects LLM priors by analyzing RL trajectories; Frontier Checkpointing optionally saves environment states at skill boundaries to improve sample efficiency. On Craftax, SCALAR achieves 88.2% diamond collection, a 1.9x improvement over the best baseline, and reaches the Gnomish Mines 9.1% of the time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · AI-based Problem Solving and Planning · Robot Manipulation and Learning
