SLAP: Shortcut Learning for Abstract Planning
Y. Isabel Liu, Bowen Li, Benjamin Eysenbach, and Tom Silver

TL;DR
SLAP enhances abstract planning in AI and robotics by automatically discovering shortcuts in task hierarchies using reinforcement learning, leading to shorter solutions and higher success rates in complex tasks.
Contribution
SLAP introduces a novel method that leverages existing TAMP options with RL to automatically find effective shortcuts, improving planning efficiency and success.
Findings
Reduces plan lengths by over 50%
Outperforms planning and RL baselines in success rates
Discovers dynamic physical improvisations
Abstract
Long-horizon decision-making with sparse rewards and continuous states and actions remains a fundamental challenge in AI and robotics. Task and motion planning (TAMP) is a model-based framework that addresses this challenge by planning hierarchically with abstract actions (options). These options are manually defined, limiting the agent to behaviors that we as human engineers know how to program (pick, place, move). In this work, we propose Shortcut Learning for Abstract Planning (SLAP), a method that leverages existing TAMP options to automatically discover new ones. Our key idea is to use model-free reinforcement learning (RL) to learn shortcuts in the abstract planning graph induced by the existing options in TAMP. Without any additional assumptions or inputs, shortcut learning leads to shorter solutions than pure planning, and higher task success rates than flat and hierarchical RL.…
Peer Reviews
Decision·ICLR 2026 Poster
Framing and practical value - Clear problem statement that existing TAMP systems rely on hand designed skills which limits efficiency and expressivity. SLAP focuses squarely on reducing execution time without discarding the benefits of abstraction and search. - Elegant algorithmic design that keeps planning and learning modular. The abstract planning graph yields a well defined search problem, and shortcuts are learned in parallel MDPs with simple goal conditions and dense step penalties. - Rela
Assumptions and scope - Relies on a known transition function and fully observable deterministic settings for graph construction, although variants relax these assumptions. Real systems often face sensing delays, latency, and controller noise which may require tighter integration of failure recovery and uncertainty aware planning. - Assumes the provided option set enables task completion. When this is not true, completeness can be lost. Appendix results discuss such cases but a stronger treatmen
The paper is well written and easy to read, especially Fig 1 provides a good illustration of desired shortcuts and Fig 3 gives a clear overview of the method. Technically, the proposed method to generalize over objects / object numbers, to the best of my knowledge, provides an interesting and novel way for downstream adaptation. Additionally, the experiment results strongly support the proposed method, where it outperforms vanilla TAMP and hierachical RL methods.
My major question is around the novelty of the proposed method. There are many skill learning methods where people learn skills by specifying a goal state (and optionally a start state) with RL, and then apply such learned skills to downstream planning or HRL. If I understand correctly, the proposed method is very similar to those methods, except that the start and goal states come from the planning graph. Is that right? Any novelty I missed? Meanwhile, I wonder what if the learned shortcut ha
- The intuition of the paper (discovering shortcut connections in an abstract planning graph) is conceptually sound and easy to grasp. It is a simple yet effective idea that directly addresses the inefficiency of long hierarchical plans in TAMP frameworks. - The approach can produce genuinely new high-level actions that are not part of the manually defined option set (e.g., slap). This demonstrates the potential of the method to extend the agent’s action set beyond what is explicitly encoded by
A primary weakness of this work lies in its conceptual alignment with the TAMP paradigm and the choice of experimental baselines. 1. Conceptual Dissonance with TAMP: The core philosophy of TAMP is to find plans that are valid with respect to a given symbolic model, ensuring that high-level action sequences are grounded and physically feasible according to predefined rules. The proposed method, SLAP, learns "shortcut" policies (e.g., "slapping" a tower) that achieve a goal state by operating out
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · AI-based Problem Solving and Planning
