MuTT: A Multimodal Trajectory Transformer for Robot Skills
Claudius Kienle, Benjamin Alt, Onur Celik, Philipp Becker, Darko, Katic, Rainer J\"akel, Gerhard Neumann

TL;DR
MuTT is a transformer-based model that predicts environment-aware robot skill executions by fusing vision and trajectory data, enabling efficient parameter optimization without real-world trials.
Contribution
We introduce MuTT, a novel multimodal transformer architecture that fuses vision and trajectory data for environment-aware robot skill prediction and optimization.
Findings
MuTT outperforms existing methods in predicting robot skill executions.
It enables environment-aware parameter optimization without real-world trials.
Demonstrates versatility across different robot skill representations.
Abstract
High-level robot skills represent an increasingly popular paradigm in robot programming. However, configuring the skills' parameters for a specific task remains a manual and time-consuming endeavor. Existing approaches for learning or optimizing these parameters often require numerous real-world executions or do not work in dynamic environments. To address these challenges, we propose MuTT, a novel encoder-decoder transformer architecture designed to predict environment-aware executions of robot skills by integrating vision, trajectory, and robot skill parameters. Notably, we pioneer the fusion of vision and trajectory, introducing a novel trajectory projection. Furthermore, we illustrate MuTT's efficacy as a predictor when combined with a model-based robot skill optimizer. This approach facilitates the optimization of robot skill parameters for the current environment, without the need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Robot Manipulation and Learning · Social Robot Interaction and HRI
