Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata, Motonari Kambara, Daichi Yashima, Ryosuke, Korekata, Komei Sugiura

TL;DR
This paper presents a new model for generating mobile manipulation instructions from multiple images, using an automatic metric enhancement training method, leading to improved instruction quality and task performance.
Contribution
Introduces a novel model that generates instructions from target and receptacle images and a training method that incorporates automatic evaluation scores as rewards.
Findings
Outperforms baseline models on automatic metrics.
Enhances language instruction quality for mobile manipulation.
Improves task performance through augmented instruction data.
Abstract
We consider the problem of generating free-form mobile manipulation instructions based on a target object image and receptacle image. Conventional image captioning models are not able to generate appropriate instructions because their architectures are typically optimized for single-image. In this study, we propose a model that handles both the target object and receptacle to generate free-form instruction sentences for mobile manipulation tasks. Moreover, we introduce a novel training method that effectively incorporates the scores from both learning-based and n-gram based automatic evaluation metrics as rewards. This method enables the model to learn the co-occurrence relationships between words and appropriate paraphrases. Results demonstrate that our proposed method outperforms baseline methods including representative multimodal large language models on standard automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Hand Gesture Recognition Systems · Advanced Numerical Analysis Techniques
