Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data
Omar Mena, Alexandre Kouyoumdjian, Lonni Besan\c{c}on, Michael, Gleicher, Ivan Viola, Anders Ynnerman

TL;DR
This paper introduces a method to enhance large language models with visual and textual data for improved conversational visualization of scientific geospatial data, without requiring model fine-tuning.
Contribution
The authors propose a novel approach to augment LLMs with visual and textual data, enabling accurate question answering in scientific visualization tasks without fine-tuning.
Findings
Effective integration of visual snapshots and text descriptions into LLMs
Improved question answering accuracy in scientific visualization
Applicable to any finalized visualization with textual descriptions
Abstract
We present a method for augmenting a Large Language Model (LLM) with a combination of text and visual data to enable accurate question answering in visualization of scientific data, making conversational visualization possible. LLMs struggle with tasks like visual data interaction, as they lack contextual visual information. We address this problem by merging a text description of a visualization and dataset with snapshots of the visualization. We extract their essential features into a structured text file, highly compact, yet descriptive enough to appropriately augment the LLM with contextual information, without any fine-tuning. This approach can be applied to any visualization that is already finally rendered, as long as it is associated with some textual description.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies
