Translating a Visual LEGO Manual to a Machine-Executable Plan

Ruocheng Wang; Yunzhi Zhang; Jiayuan Mao; Chin-Yi Cheng; Jiajun Wu

arXiv:2207.12572·cs.CV·July 27, 2022

Translating a Visual LEGO Manual to a Machine-Executable Plan

Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu

PDF

Open Access

TL;DR

This paper introduces MEPNet, a learning-based framework that translates visual LEGO manuals into machine-executable assembly plans by predicting component placements and poses, handling unseen parts and establishing 2D-3D correspondence.

Contribution

The paper presents a novel neural network architecture, MEPNet, that effectively converts image-based assembly instructions into 3D plans, outperforming existing methods on multiple datasets.

Findings

01

MEPNet achieves higher accuracy than previous methods.

02

The framework generalizes well to unseen components.

03

It effectively establishes 2D-3D correspondence for assembly tasks.

Abstract

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions. We formulate this problem as a sequential prediction task: at each step, our model reads the manual, locates the components to be added to the current shape, and infers their 3D poses. This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. The key idea is to integrate neural 2D keypoint detection modules and 2D-3D projection algorithms for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Human Pose and Action Recognition · Robot Manipulation and Learning