Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving

Oussama Zenkri; Oliver Brock

arXiv:2605.20072·cs.AI·May 20, 2026

Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving

Oussama Zenkri, Oliver Brock

PDF

TL;DR

This study investigates embodied LLM agents in robotic tasks, revealing that higher observation fidelity can impair performance, with moderate noise unexpectedly enhancing success rates.

Contribution

It empirically demonstrates that imperfect perception can sometimes improve LLM-based robotic problem solving, challenging assumptions about observation quality.

Findings

01

Agents perform best with raw RGB input and worst with perfect ground-truth data.

02

Moderate noise in perception can increase success rates, peaking at 40% flip probability.

03

Performance improvements are linked to reduced repetitive action loops.

Abstract

Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying the information available to the agent and measuring the resulting changes in behavior. Using the Lockbox, a sequential mechanical puzzle with hidden interdependencies, we evaluate LLMs across RGB, RGB-D, and ground-truth symbolic observations in a physical robotic setup and use controlled simulation to probe the resulting behavior. Counterintuitively, agents perform best under raw RGB input and worst under perfect ground-truth observations. In simulation, we probe this effect by randomly flipping perceived action outcomes and find that moderate noise improves performance, peaking at a 40%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.