GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Marcelino Julio Fernando; Miguel Altamirano Cabrera; Jeffrin Sam; Yara Mahmoud; Konstantin Gubernatorov; Dzmitry Tsetserukou

arXiv:2604.19522·cs.RO·April 22, 2026

GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Marcelino Julio Fernando, Miguel Altamirano Cabrera, Jeffrin Sam, Yara Mahmoud, Konstantin Gubernatorov, Dzmitry Tsetserukou

PDF

TL;DR

GenerativeMPC integrates semantic scene understanding with physical control for bimanual mobile manipulation using VLM-RAG and experience-driven grounding, enabling safe, context-aware interaction.

Contribution

It introduces a hierarchical framework that bridges semantic reasoning with physical control parameters via VLM-RAG and experience-driven grounding, advancing human-centric cyber-physical manipulation.

Findings

01

Achieved 60% speed reduction near humans for safety.

02

Enabled safe, socially-aware navigation and manipulation.

03

Validated effectiveness on MuJoCo, IsaacSim, and physical platform.

Abstract

Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control parameters for bimanual mobile manipulators. The system utilizes a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control constraints, specifically outputting dynamic velocity limits and safety margins for a Whole-Body Model Predictive Controller (MPC). Simultaneously, the VLM-RAG module modulates virtual stiffness and damping gains for a unified impedance-admittance controller, enabling context-aware compliance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.