Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations
Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker,, John Stasko

TL;DR
This paper presents a new corpus of natural language utterances collected from participants describing data visualizations, aiding the development and evaluation of natural language interfaces for data visualization.
Contribution
It introduces an empirical dataset of natural language utterances for visualizations, addressing the lack of understanding in how people specify visualizations through language.
Findings
Collected 102 participant utterances for 10 visualizations
Provided a curated corpus for evaluating NLIs in data visualization
Facilitates development of new systems to generate visualizations from language
Abstract
Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. To bridge this gap, we conducted an online study with 102 participants. We showed participants a series of ten visualizations for a given dataset and asked them to provide utterances they would pose to generate the displayed charts. The curated list of utterances generated from the study is provided below. This corpus of utterances can be used to evaluate existing NLIs for data visualization as well as for creating new systems and models to generate visualizations from natural language utterances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
