AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making
Shusen Liu, Haichao Miao, Zhimin Li, Matthew Olson, Valerio Pascucci,, Peer-Timo Bremer

TL;DR
This paper introduces Autonomous Visualization Agents (AVAs) that leverage visual perception in multi-modal LLMs to interpret and achieve user-defined visualization goals, enabling domain experts to generate visualizations through natural language.
Contribution
It presents the first framework for AVAs utilizing visual perception in multi-modal LLMs, demonstrating their applicability across various visualization tasks and scenarios.
Findings
AVAs can interpret visual outputs to assist in visualization tasks.
Preliminary agents show potential in domain-specific visualization applications.
Expert feedback indicates high practicality and future potential of AVAs.
Abstract
With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Computational and Text Analysis Methods
