Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving
Oussama Zenkri, Oliver Brock

TL;DR
This study investigates embodied LLM agents in robotic tasks, revealing that higher observation fidelity can impair performance, with moderate noise unexpectedly enhancing success rates.
Contribution
It empirically demonstrates that imperfect perception can sometimes improve LLM-based robotic problem solving, challenging assumptions about observation quality.
Findings
Agents perform best with raw RGB input and worst with perfect ground-truth data.
Moderate noise in perception can increase success rates, peaking at 40% flip probability.
Performance improvements are linked to reduced repetitive action loops.
Abstract
Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying the information available to the agent and measuring the resulting changes in behavior. Using the Lockbox, a sequential mechanical puzzle with hidden interdependencies, we evaluate LLMs across RGB, RGB-D, and ground-truth symbolic observations in a physical robotic setup and use controlled simulation to probe the resulting behavior. Counterintuitively, agents perform best under raw RGB input and worst under perfect ground-truth observations. In simulation, we probe this effect by randomly flipping perceived action outcomes and find that moderate noise improves performance, peaking at a 40%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
