Visual Reasoning and Multi-Agent Approach in Multimodal Large Language   Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Mohammed Elhenawy; Ahmad Abutahoun; Taqwa I.Alhadidi; Ahmed Jaber,; Huthaifa I. Ashqar; Shadi Jaradat; Ahmed Abdelhay; Sebastien Glaser; and; Andry Rakotonirainy

arXiv:2407.00092·cs.AI·July 2, 2024

Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges

Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I.Alhadidi, Ahmed Jaber,, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, and, Andry Rakotonirainy

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates how multimodal large language models, equipped with multiple specialized agents, can effectively solve complex combinatorial problems like TSP and mTSP through visual reasoning and zero-shot multi-agent strategies.

Contribution

Introduces a multi-agent framework within MLLMs for solving TSP and mTSP, showcasing improved solution quality and innovative zero-shot in-context problem-solving capabilities.

Findings

01

Multi-Agent 1 significantly improves solution quality for TSP and mTSP.

02

Multi-Agent 2 enables rapid decision-making with iterative refinements.

03

Experimental results demonstrate robust visual reasoning in combinatorial optimization.

Abstract

Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems, including zero-shot in-context learning scenarios. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensional plane. We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these combinatorial challenges. Our experimental investigation includes rigorous evaluations across zero-shot settings and introduces innovative multi-agent zero-shot in-context scenarios. The results demonstrated that both multi-agent models. Multi-Agent 1, which includes the Initializer, Critic, and Scorer agents, and Multi-Agent 2, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahmed-abdulhuy/solving-tsp-and-mtsp-combinatorial-challenges-using-visual-reasoning-and-multi-agent-approach-mllms-
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Geographic Information Systems Studies