Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories
F\'abio Vital, Miguel Vasco, Alberto Sardinha, and Francisco Melo

TL;DR
This paper introduces PRG, a three-stage framework that translates multimodal perceptual inputs into robotic motion trajectories, demonstrated through a handwriting task with inputs like images and sounds.
Contribution
The paper presents a novel three-stage framework combining perception, multimodal encoding, and trajectory generation for robotic motion based on multimodal instructions.
Findings
Effective translation of multimodal inputs into handwriting trajectories
Successful implementation on a robotic platform for writing tasks
Demonstrated coherence and readability in generated handwriting
Abstract
We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a sequence of instructions, to an adequate sequence of movements to be executed by a robot. In the first stage, we perceive and pre-process the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the multimodal latent values into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution in a robotic platform. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Natural Language Processing Techniques
