Visually Grounding Language Instruction for History-Dependent Manipulation
Hyemin Ahn, Obin Kwon, Kyoungdo Kim, Jaeyeon Jeong, Howoong Jun,, Hongjung Lee, Dongheui Lee, Songhwai Oh

TL;DR
This paper introduces a history-dependent manipulation task for robots, emphasizing the importance of referring to task history for better interpretation of language instructions and visual inference, supported by a new dataset and model.
Contribution
It proposes a novel history-dependent manipulation task, a related dataset, and a baseline model that can be applied to real-world scenarios using CycleGAN.
Findings
Model trained on the dataset effectively interprets history-dependent instructions.
The approach enables inference of occluded objects based on manipulation history.
The dataset and model are publicly available for further research.
Abstract
This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be interpreted, and (2) the visual information of objects occluded by previous manipulations can be inferred. For this, we introduce a history-dependent manipulation task which objective is to visually ground a series of language instructions for proper pick-and-place manipulations by referring to the past. We also suggest a relevant dataset and model which can be a baseline, and show that our model trained with the proposed dataset can also be applied to the real world based on the CycleGAN. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Natural Language Processing Techniques
MethodsGAN Least Squares Loss · Residual Connection · Tanh Activation · *Communicated@Fast*How Do I Communicate to Expedia? · PatchGAN · Convolution · Instance Normalization · Cycle Consistency Loss · Sigmoid Activation · HuMan(Expedia)||How do I get a human at Expedia?
