The Visualization JUDGE : Can Multimodal Foundation Models Guide Visualization Design Through Visual Perception?
Matthew Berger, Shusen Liu

TL;DR
This paper explores how multimodal foundation models can be used as judges to critique and guide the design of visualizations by leveraging their visual perception capabilities.
Contribution
It introduces a framework for using multimodal foundation models as evaluative judges to improve visualization design through perception and critique.
Findings
MFMs can effectively perceive and critique visualizations.
A formalization of the visualization design and optimization space.
Characterization of text-to-image and multi-modal language models for visualization guidance.
Abstract
Foundation models for vision and language are the basis of AI applications across numerous sectors of society. The success of these models stems from their ability to mimic human capabilities, namely visual perception in vision models, and analytical reasoning in large language models. As visual perception and analysis are fundamental to data visualization, in this position paper we ask: how can we harness foundation models to advance progress in visualization design? Specifically, how can multimodal foundation models (MFMs) guide visualization design through visual perception? We approach these questions by investigating the effectiveness of MFMs for perceiving visualization, and formalizing the overall visualization design and optimization space. Specifically, we think that MFMs can best be viewed as judges, equipped with the ability to criticize visualizations, and provide us with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Language, Metaphor, and Cognition · Speech and dialogue systems
