ArtiBench and ArtiBrain: Benchmarking Generalizable Vision-Language Articulated Object Manipulation
Yuhan Wu, Tiantian Wei, Shuo Wang, ZhiChao Wang, Yanyong Zhang, Daniel Cremers, Yan Xia

TL;DR
This paper introduces ArtiBench, a comprehensive benchmark for evaluating generalizable vision-language manipulation of articulated objects, and proposes ArtiBrain, a modular framework that combines reasoning and adaptive control to improve manipulation robustness and generalization.
Contribution
We present ArtiBench, a new benchmark for structured evaluation of articulated object manipulation, and ArtiBrain, a novel framework integrating reasoning, control, and memory for enhanced generalization.
Findings
ArtiBrain outperforms existing methods in robustness and generalization on ArtiBench.
The benchmark reveals key challenges in cross-part and cross-instance manipulation.
The framework effectively combines high-level reasoning with low-level adaptive control.
Abstract
Interactive articulated manipulation requires long-horizon, multi-step interactions with appliances while maintaining physical consistency. Existing vision-language and diffusion-based policies struggle to generalize across parts, instances, and categories. We first introduce ArtiBench, a five-level benchmark covering kitchen, storage, office, and tool environments. ArtiBench enables structured evaluation from cross-part and cross-instance variation to long-horizon multi-object tasks, revealing the core generalization challenges of articulated object manipulation. Building on this benchmark, we propose ArtiBrain, a modular framework that unifies high-level reasoning with adaptive low-level control. ArtiBrain uses a VLM-based Task Reasoner (GPT-4.1) to decompose and validate subgoals, and employs a Hybrid Controller that combines geometry-aware keyframe execution with affordance-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms
