OpenD: A Benchmark for Language-Driven Door and Drawer Opening
Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme

TL;DR
This paper presents OPEND, a benchmark for robotic hand manipulation guided by language instructions, combining neural networks and rule-based controllers in a physics-based simulation environment.
Contribution
It introduces a new benchmark and a multi-step planning approach integrating language understanding and spatial reasoning for robotic manipulation tasks.
Findings
Effective decision planning demonstrated in zero-shot evaluation.
Significant room for improvement in language understanding and manipulation.
Benchmark and challenges to foster future research.
Abstract
We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction. To solve the task, we propose a multi-step planner composed of a deep neural network and rule-base controllers. The network is utilized to capture spatial relationships from images and understand semantic meaning from language instructions. Controllers efficiently execute the plan based on the spatial and semantic understanding. We evaluate our system by measuring its zero-shot performance in test data set. Experimental results demonstrate the effectiveness of decision planning by our multi-step planner for different hands, while suggesting that there is significant room for developing better models to address the challenge brought by language understanding, spatial reasoning, and long-term…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsTest · Context Aggregated Bi-lateral Network for Semantic Segmentation
