MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

Mehrshad Taji; Arad Mahdinezhad Kashani; Iman Ahmadi; AmirHossein Jadidi; Saina Kashani; Babak Khalaj

arXiv:2602.16898·cs.RO·May 15, 2026

MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

Mehrshad Taji, Arad Mahdinezhad Kashani, Iman Ahmadi, AmirHossein Jadidi, Saina Kashani, Babak Khalaj

PDF

1 Repo

TL;DR

MALLVI introduces a multi-agent framework combining vision and language models for robust, closed-loop robotic manipulation that improves success rates through iterative feedback and specialized agent coordination.

Contribution

It presents a novel multi-agent system that integrates perception, reasoning, and feedback for robotic manipulation, outperforming prior open-loop approaches.

Findings

01

Closed-loop multi-agent coordination enhances manipulation success.

02

The framework generalizes well in zero-shot settings.

03

Code is available at https://github.com/iman1234ahmadi/MALLVI.

Abstract

Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iman1234ahmadi/MALLVI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.