Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

Giuseppe Carenini; Jordon Johnson; Ali Salamatian

arXiv:2406.19512·cs.CL·July 1, 2024

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

Giuseppe Carenini, Jordon Johnson, Ali Salamatian

PDF

Open Access

TL;DR

This tutorial explores how large language models can be used to automatically generate captions for visualizations, highlighting recent advances, applications, and future directions in the field.

Contribution

It provides a comprehensive overview of applying large language models to visualization captioning, including neural models, transformer architectures, and emerging research directions.

Findings

01

LLMs enable improved visualization captioning.

02

Transformer architectures are central to recent advances.

03

Future research directions are identified for further development.

Abstract

Automatically captioning visualizations is not new, but recent advances in large language models(LLMs) open exciting new possibilities. In this tutorial, after providing a brief review of Information Visualization (InfoVis) principles and past work in captioning, we introduce neural models and the transformer architecture used in generic LLMs. We then discuss their recent applications in InfoVis, with a focus on captioning. Additionally, we explore promising future directions in this field.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Video Analysis and Summarization

MethodsFocus