VA-FastNavi-MARL: Real-Time Robot Control with Multimedia-Driven Meta-Reinforcement Learning
Yang Zhang, Shengxi Jing, Fengxiang Wang, Yuan Feng, Hong Wang

TL;DR
VA-FastNavi-MARL is a novel framework that enables real-time, low-latency robot control by integrating multimedia inputs into a unified representation and using meta-reinforcement learning for rapid adaptation.
Contribution
It introduces a modality-agnostic, multimedia-driven meta-reinforcement learning approach for real-time robot navigation and control, improving adaptability and efficiency.
Findings
Outperforms baselines in sample efficiency.
Maintains robust real-time control under noisy multimedia streams.
Enables rapid adaptation to unseen instructions.
Abstract
Interpreting dynamic, heterogeneous multimedia commands with real-time responsiveness is critical for Human-Robot Interaction. We present VA-FastNavi-MARL, a framework that aligns asynchronous audio-visual inputs into a unified latent representation. By treating diverse instructions as a distribution of navigable goals via Meta-Reinforcement Learning, our method enables rapid adaptation to unseen directives with negligible inference overhead. Unlike approaches bottlenecked by heavy sensory processing, our modality-agnostic stream ensures seamless, low-latency control. Validation on a multi-arm workspace confirms that VA-FastNavi-MARL significantly outperforms baselines in sample efficiency and maintains robust, real-time execution even under noisy multimedia streams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
