Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

He-Yang Xu; Pengyuan Zhang; Zongyuan Ge; Xiaoshuai Hao; Serge Belongie; Xin Geng; Yuxin Peng; Xiu-Shen Wei

arXiv:2605.19986·cs.RO·May 20, 2026

Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

He-Yang Xu, Pengyuan Zhang, Zongyuan Ge, Xiaoshuai Hao, Serge Belongie, Xin Geng, Yuxin Peng, Xiu-Shen Wei

PDF

1 Repo

TL;DR

MetaFine is a diagnostic framework that dissects fine-grained manipulation skills into understanding, perception, and controlled behavior, revealing hidden model weaknesses and guiding targeted improvements.

Contribution

The paper introduces MetaFine, a novel diagnostic evaluation framework that reconstructs heterogeneous benchmarks into diagnostic scenarios, exposing model failures and guiding targeted enhancements.

Findings

01

Visual encoder quality is a key bottleneck for fine-grained manipulation.

02

MetaFine reveals severe dimension-specific failures in state-of-the-art models.

03

Hybrid real-sim validation improves physical benchmarking stability.

Abstract

Fine-grained manipulation marks a regime where global scene context no longer suffices, and success hinges on the tight coupling of local attribute grounding, high-fidelity spatial perception, and constraint-respecting motor execution. However, current embodied AI benchmarks collapse these capacities into binary success rates, systematically inflating reported capabilities by up to 70% and masking the architectural bottlenecks that impede real-world deployment. We introduce MetaFine, a diagnostic meta-evaluation framework that disentangles manipulation competency along three axes: understanding, perception, and controlled behavior. Built on a compositional task graph, MetaFine absorbs heterogeneous external benchmarks and reconstructs them into diagnostic scenarios of varying complexity under a unified protocol. Evaluating state-of-the-art vision-language-action (VLA) models through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://metafine.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.