LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics

Anand Gokhale; Vaibhav Srivastava; Francesco Bullo

arXiv:2507.03293·cs.AI·September 24, 2025

LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics

Anand Gokhale, Vaibhav Srivastava, Francesco Bullo

PDF

Open Access 3 Reviews

TL;DR

LogicGuard enhances LLM-based embodied agents by integrating a temporal logic critic that guides high-level decision making, improving safety, efficiency, and reliability in complex long-horizon tasks.

Contribution

We propose a modular actor-critic framework where an LLM critic uses Linear Temporal Logic to guide and improve the decision-making of LLM actors in embodied tasks.

Findings

01

Increased task completion rates by 25% on household tasks.

02

Improved efficiency and safety in long-horizon Minecraft tasks.

03

Demonstrated generality across different task settings.

Abstract

Large language models (LLMs) have shown promise in zero-shot and single step reasoning and decision making problems, but in long horizon sequential planning tasks, their errors compound, often leading to unreliable or inefficient behavior. We introduce LogicGuard, a modular actor-critic architecture in which an LLM actor is guided by a trajectory level LLM critic that communicates through Linear Temporal Logic (LTL). Our setup combines the reasoning strengths of language models with the guarantees of formal logic. The actor selects high-level actions from natural language observations, while the critic analyzes full trajectories and proposes new LTL constraints that shield the actor from future unsafe or inefficient behavior. LogicGuard supports both fixed safety rules and adaptive, learned constraints, and is model-agnostic: any LLM-based planner can serve as the actor, with LogicGuard…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- The problem addressed by the paper is interesting, they want to automatically add constraints and verify them for embodied LLM agents. - The integration of LTL-based symbolic constraints with LLM-driven planning is practically interesting. - The experimental evaluation on two domains (Behavior and Minecraft) demonstrates empirical improvements in completion and efficiency rates.

Weaknesses

- The contribution is primarily empirical rather than methodological. The proposed framework lacks a formal or algorithmic novelty beyond combining existing elements (LLM planners, constraint checking, and symbolic reasoning). - The entire pipeline is heavily based on LLM, which introduces noise and risk of hallucinating at multiple stages such as: (i) _State grounding_ (ii) _Constraint generation_ which relies on natural language rules induced by an LLM and (iii) _Constraint translation_ where

Reviewer 02Rating 6Confidence 5

Strengths

1. Novel Integration of Formal Methods with LLMs: The use of LTL as a communication protocol between actor and critic is innovative and addresses a critical gap in LLM-based planning. Prior work using natural language feedback, LTL provides verifiable, machine-checkable constraints with formal guarantees but this work went ahead to have a online actor critic. 2. Strong Empirical Results Across Diverse Settings: The paper demonstrates substantial improvements in both generalist (Behavior: 47%->7

Weaknesses

1. Missing ablations and baselines:There is no direct comparison between online vs. offline critic modes, which is crucial to understand trade-offs in adaptability and computational overhead. Baseline coverage (e.g., other LLM-based critics or symbolic guardrail methods) is limited. What is the contribution of each critic source (environment feedback, graph-based efficiency, over-constrained states)? 2. Evaluation scale and statistical rigor: Reported experiments appear based on small sample si

Reviewer 03Rating 4Confidence 3

Strengths

1. The system combines symbolic reasoning with the generalization ability of LLMs. 2. Every constraint has a verbalized explanation, helping users inspect the agent’s decision process.

Weaknesses

1. Offline critic analysis and frequent rule updates require repeated LLM calls, which can be expensive and potentially unstable. 2. This work hand crafts a large number of rules and complex prompts. This constraints the environment and lowers the task difficulty.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Artificial Intelligence in Games