Exploring MLLMs Perception of Network Visualization Principles
Jacob Miller, Markus Wallinger, Ludwig Felder, Timo Brand, Henry F\"orster, Johannes Zink, Chunyang Chen, Stephen Kobourov

TL;DR
This study evaluates whether multimodal large language models can perceive network visualization qualities similarly to humans, finding they perform comparably and can even surpass human experts with optimized prompts.
Contribution
It demonstrates that MLLMs can match or exceed human perception of network layout quality, revealing their potential for visual analysis tasks.
Findings
MLLMs perform comparably to human experts in network perception tasks.
Prompt engineering can enhance MLLMs' performance beyond human levels.
MLLMs rely on visual proxies similar to human perceptual cues.
Abstract
In this paper, we test whether Multimodal Large Language Models (MLLMs) can match human-subject performance in tasks involving the perception of properties in network layouts. Specifically, we replicate a human-subject experiment about perceiving quality (namely stress) in network layouts using GPT-4o, Gemini-2.5 and Qwen2.5. Our experiments show that giving MLLMs the same study information as trained human participants yields performance comparable to that of human experts and exceeds that of untrained non-experts. Additionally, we show that prompt engineering that deviates from the human-subject experiment can lead to better-than-human performance in some settings. Interestingly, like human subjects, the MLLMs seem to rely on visual proxies rather than computing the actual value of stress, indicating some sense or facsimile of perception. Explanations from the models are similar to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
