TL;DR
This paper introduces a physical agentic loop for language-guided robotic grasping that enhances robustness and interpretability by monitoring execution states and enabling bounded recovery, validated on a mobile manipulator.
Contribution
It reformulates language-guided grasping as an embodied agent with explicit execution-state monitoring and recovery, improving robustness without changing the underlying grasp model.
Findings
Explicit execution-state monitoring improves robustness.
Bounded recovery enables finite, interpretable behavior.
Minimal overhead added to existing grasp models.
Abstract
Robotic manipulation systems that follow language instructions often execute grasp primitives in a largely single-shot manner: a model proposes an action, the robot executes it, and failures such as empty grasps, slips, stalls, timeouts, or semantically wrong grasps are not surfaced to the decision layer in a structured way. Inspired by agentic loops in digital tool-using agents, we reformulate language-guided grasping as a bounded embodied agent operating over grounded execution states, where physical actions expose an explicit tool-state stream. We introduce a physical agentic loop that wraps an unmodified learned manipulation primitive (grasp-and-lift) with (i) an event-based interface and (ii) an execution monitoring layer, Watchdog, which converts noisy gripper telemetry into discrete outcome labels using contact-aware fusion and temporal stabilization. These outcome events,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
