Sketch-MoMa: Teleoperation for Mobile Manipulator via Interpretation of Hand-Drawn Sketches
Kosei Tanada, Yuka Iwanaga, Masayoshi Tsuchinaga, Yuji Nakamura,, Takemitsu Mori, Remi Sakai, Takashi Yamamoto

TL;DR
Sketch-MoMa enables intuitive robot teleoperation through hand-drawn sketches by leveraging vision-language models to interpret sketches and control a mobile manipulator effectively, simplifying user interaction without complex modalities.
Contribution
This work introduces a novel sketch-based teleoperation system using VLMs for understanding sketches, reducing complexity and enhancing usability in robot control.
Findings
Effective interpretation of sketches with VLMs for robot control
Successful validation on 7 tasks with 5 sketch shapes
User study shows competitive usability compared to traditional interfaces
Abstract
To use assistive robots in everyday life, a remote control system with common devices, such as 2D devices, is helpful to control the robots anytime and anywhere as intended. Hand-drawn sketches are one of the intuitive ways to control robots with 2D devices. However, since similar sketches have different intentions from scene to scene, existing work needs additional modalities to set the sketches' semantics. This requires complex operations for users and leads to decreasing usability. In this paper, we propose Sketch-MoMa, a teleoperation system using the user-given hand-drawn sketches as instructions to control a robot. We use Vision-Language Models (VLMs) to understand the user-given sketches superimposed on an observation image and infer drawn shapes and low-level tasks of the robot. We utilize the sketches and the generated shapes for recognition and motion planning of the generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Teleoperation and Haptic Systems · Human Motion and Animation
