Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

Sukai Huang; Chenyuan Zhang; Fucai Ke; Zhixi Cai; Gholamreza Haffari; Lizhen Qu; Hamid Rezatofighi

arXiv:2604.17019·cs.AI·April 21, 2026

Mini-BEHAVIOR-Gran: Revealing U-Shaped Effects of Instruction Granularity on Language-Guided Embodied Agents

Sukai Huang, Chenyuan Zhang, Fucai Ke, Zhixi Cai, Gholamreza Haffari, Lizhen Qu, Hamid Rezatofighi

PDF

TL;DR

This paper introduces Mini-BEHAVIOR-Gran, a benchmark to study how different levels of instruction detail affect embodied AI performance, revealing a U-shaped relationship with optimal granularity at both extremes.

Contribution

It provides a new benchmark with multiple instruction variants per task and compares metrics for instruction granularity, uncovering a non-monotonic effect on agent performance.

Findings

01

Width correlates most with agent performance among metrics.

02

Performance shows a U-shaped relationship with instruction granularity.

03

Coarse instructions lead to vision-dominant policies.

Abstract

Instruction granularity is an important yet poorly controlled variable in language-guided embodied AI. Existing benchmarks typically pair each task with a single static instruction, making it difficult to study how agent behavior changes when the same task is described at different levels of detail. We introduce Mini-BEHAVIOR-Gran, a new benchmark for controlled studies of instruction granularity that extends Mini-BEHAVIOR with multiple instruction variants per task, ranging from high-level goal descriptions to step-by-step guidance. Using this benchmark, we compare four candidate metrics for cross-task granularity quantification: token count, entity count, action-verb count, and planning-width, and find that width correlates most consistently with agent performance. Using width to organize training and evaluation further reveals a non-monotonic U-shaped relationship between instruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.