Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks
Ryosuke Korekata, Motonari Kambara, Yu Yoshida, Shintaro Ishikawa,, Yosuke Kawasaki, Masaki Takahashi, Komei Sugiura

TL;DR
This paper introduces Switching Head-Tail Funnel UNITER, a model that efficiently understands natural language instructions for domestic robots by predicting objects and destinations separately, improving accuracy and success rates in real-world tasks.
Contribution
The paper presents a novel model that reduces computational complexity by predicting target objects and destinations separately, validated on a new dataset and real robot experiments.
Findings
Outperforms baseline in language comprehension accuracy
Achieves over 90% success rate in object delivery tasks
Effective in real-world domestic robot scenarios
Abstract
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target object to the destination. Most of the existing multimodal language understanding methods are impractical in terms of computational complexity because they require inferences for all combinations of target object candidates and destination candidates. We propose Switching Head-Tail Funnel UNITER, which solves the task by predicting the target object and the destination individually using a single model. Our method is validated on a newly-built dataset consisting of object manipulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Robot Manipulation and Learning
Methodstravel james · UNiversal Image-TExt Representation Learning
