Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition
Taylor Bergeron, Zachary Serlin, Kevin Leahy

TL;DR
Comp-LTL introduces a zero-shot method for agents to satisfy arbitrary LTL specifications by composing pre-trained task primitives, eliminating the need for retraining when goals change, and ensuring safe, feasible solutions.
Contribution
This paper presents a novel zero-shot approach to LTL planning that uses composable task primitives and transition system pruning, enabling flexible and safe goal satisfaction without retraining.
Findings
Comp-LTL achieves safer task execution compared to existing methods.
The approach effectively handles arbitrary LTL specifications without retraining.
Simulation results demonstrate improved adaptability and correctness.
Abstract
This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach -- to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS)…
Peer Reviews
Decision·Submitted to ICLR 2026
## Originality The work is grounded quite closely to the literature on using LTL specifications for temporal and spatial task composition. This is not inherently bad, and in fact by positioning the work clearly against these prior works it does make the differences stand out. ## Quality The motivation of the work is clear and the hypothesis is grounded in prior work. The results that are presented are interpreted fairly. ## Clarity The paper is well written and figures are clear and useful ove
## Clarity Firstly the minor concern, the figure captions are very uninformative and this limits the benefit of the figures substantially. Figure 3a in particular really needs to be more descriptive both in terms of the caption and figure itself. The work also uses jargon which is not sufficiently defined such as "sound". When a word is used in this manner to mean something technical it is necessary to define it formally. More importantly, it is very difficult for me to see the connection betwe
1. The integration of deterministic TS pruning and Boolean policy composition for zero-shot LTL satisfaction is original and well-motivated. 2. The paper provides clear theorems (determinism, soundness, feasibility) with proof sketches, demonstrating a solid theoretical foundation. 3. Comp-LTL requires no fine-tuning or retraining, showing strong adaptability to unseen specifications in grid-world environments.
1. The Q-learning algorithms used in the paper are value-based, discrete-action algorithms — meaning they assume a finite, enumerable action space. Authors claim that "Our approach agnostic to the method in which the policies are trained, as we show Comp-LTL is successful with both tabular Q-learning and DQN policies." but the the Q-learning-based primitives in Comp-LTL cannot be directly applied to continuous-action robotic tasks. The experiments are confined to grid-based environments. Claims
- The paper is well written, and the contribution is clear. - The proposed idea is novel. In particular, I find interesting the connection between temporal logic specification and zero-shot composition of policies for the similarity with the multi-task problem. - The authors formally proove the soundness of the propose pruned transition system (TS). - While in part I find the empirical evaluation limited, it shows promiseing results, and highlight that the the proposed framework can satisfy new
- The main concern is scalability. Constructing the transition system in realistic or continuous domains is likely intractable, especially since it requires identifying the regions where each atomic proposition holds. - The need to train one policy for each atomic proposition $\sigma \in \Sigma$ does not scale well as the number of labels grows. - As already anticapted, emprical evaluation is limited to toy domains. It is difficult to understand whether this method can be applied in more realist
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Logic, Reasoning, and Knowledge · Model-Driven Software Engineering Techniques
