Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges
Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I.Alhadidi, Ahmed Jaber,, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, and, Andry Rakotonirainy

TL;DR
This paper demonstrates how multimodal large language models, equipped with multiple specialized agents, can effectively solve complex combinatorial problems like TSP and mTSP through visual reasoning and zero-shot multi-agent strategies.
Contribution
Introduces a multi-agent framework within MLLMs for solving TSP and mTSP, showcasing improved solution quality and innovative zero-shot in-context problem-solving capabilities.
Findings
Multi-Agent 1 significantly improves solution quality for TSP and mTSP.
Multi-Agent 2 enables rapid decision-making with iterative refinements.
Experimental results demonstrate robust visual reasoning in combinatorial optimization.
Abstract
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems, including zero-shot in-context learning scenarios. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensional plane. We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these combinatorial challenges. Our experimental investigation includes rigorous evaluations across zero-shot settings and introduces innovative multi-agent zero-shot in-context scenarios. The results demonstrated that both multi-agent models. Multi-Agent 1, which includes the Initializer, Critic, and Scorer agents, and Multi-Agent 2, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Geographic Information Systems Studies
