Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization
Jie Zhao, Kang Hao Cheong, Witold Pedrycz

TL;DR
This paper introduces a novel framework that transforms graph data into images to leverage multimodal large language models for solving complex graph-structured combinatorial problems with human-like spatial reasoning.
Contribution
It proposes converting graphs into images to utilize MLLMs for combinatorial optimization, demonstrating a new approach that mimics human spatial reasoning in machine problem-solving.
Findings
MLLMs show strong spatial reasoning abilities on graph tasks.
Transforming graphs into images improves problem-solving accuracy.
Simple search techniques combined with MLLMs are effective for complex graph challenges.
Abstract
Graph-structured combinatorial challenges are inherently difficult due to their nonlinear and intricate nature, often rendering traditional computational methods ineffective or expensive. However, these challenges can be more naturally tackled by humans through visual representations that harness our innate ability for spatial reasoning. In this study, we propose transforming graphs into images to preserve their higher-order structural features accurately, revolutionizing the representation used in solving graph-structured combinatorial tasks. This approach allows machines to emulate human-like processing in addressing complex combinatorial challenges. By combining the innovative paradigm powered by multimodal large language models (MLLMs) with simple search techniques, we aim to develop a novel and effective framework for tackling such problems. Our investigation into MLLMs spanned a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
