MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

Markus Knauer; Edoardo Fiorini; Maximilian M\"uhlbauer; Stefan Schneyer; Promwat Angsuratanawech; Florian Samuel Lay; Timo Bachmann; Samuel Bustamante; Korbinian Nottensteiner; Freek Stulp; Alin Albu-Sch\"affer; Jo\~ao Silv\'erio; Thomas Eiband

arXiv:2604.20468·cs.RO·April 24, 2026

MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

Markus Knauer, Edoardo Fiorini, Maximilian M\"uhlbauer, Stefan Schneyer, Promwat Angsuratanawech, Florian Samuel Lay, Timo Bachmann, Samuel Bustamante, Korbinian Nottensteiner, Freek Stulp, Alin Albu-Sch\"affer, Jo\~ao Silv\'erio, Thomas Eiband

PDF

TL;DR

This paper introduces MOMO, an integrated framework enabling industrial robots to adapt skills seamlessly through kinesthetic, verbal, and graphical interactions, enhancing flexibility and user-friendliness.

Contribution

It presents a novel multi-modal interaction framework combining five components for safe, flexible, and intuitive robot skill adaptation in industrial environments.

Findings

01

Validated on a 7-DoF robot at Automatica 2025

02

Demonstrated natural language-based surface finishing

03

Showed generalization of skill adaptation across modalities

Abstract

Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying tasks and environments. However, different adaptations benefit from different interaction modalities. We present an interactive framework that enables robot skill adaptation through three complementary modalities: kinesthetic touch for precise spatial corrections, natural language for high-level semantic modifications, and a graphical web interface for visualizing geometric relations and trajectories, inspecting and adjusting parameters, and editing via-points by drag-and-drop. The framework integrates five components: energy-based human-intention detection, a tool-based LLM architecture (where the LLM selects and parameterizes predefined functions rather than generating code) for safe natural language adaptation, Kernelized Movement Primitives (KMPs) for motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.