MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence

Renjun Gao

arXiv:2511.01594·cs.RO·April 8, 2026

MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence

Renjun Gao

PDF

TL;DR

MARS is a multi-agent robotic system leveraging multimodal large language models to provide adaptive, risk-aware, and personalized assistive services in smart homes for people with disabilities.

Contribution

The paper introduces a novel multi-agent framework integrating MLLMs for assistive robotics, addressing risk, personalization, and grounding in cluttered indoor environments.

Findings

01

Outperforms state-of-the-art models in risk-aware planning.

02

Demonstrates effective multi-agent coordination in dynamic settings.

03

Shows potential for real-world assistive applications.

Abstract

Multimodal large language models (MLLMs) have shown remarkable capabilities in cross-modal understanding and reasoning, offering new opportunities for intelligent assistive systems, yet existing systems still struggle with risk-aware planning, user personalization, and grounding language plans into executable skills in cluttered homes. We introduce MARS - a Multi-Agent Robotic System powered by MLLMs for assistive intelligence and designed for smart home robots supporting people with disabilities. The system integrates four agents: a visual perception agent for extracting semantic and spatial features from environment images, a risk assessment agent for identifying and prioritizing hazards, a planning agent for generating executable action sequences, and an evaluation agent for iterative optimization. By combining multimodal perception with hierarchical multi-agent decision-making, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.