TL;DR
This paper studies how professional journalists perform multi-table data wrangling, develops taxonomies and an actionable framework tailored to their workflows, and highlights the importance of treating tables as first-class objects in data analysis.
Contribution
It introduces the first comprehensive framework for multi-table data wrangling specifically designed for computational journalism, based on an artifact study of journalists' workflows.
Findings
Extensive use of multiple tables in journalistic data analysis
Identification of unique wrangling operations without parallels in existing literature
Development of a framework supporting interactive multi-table data wrangling
Abstract
For the many journalists who use data and computation to report the news, data wrangling is an integral part of their work.Despite an abundance of literature on data wrangling in the context of enterprise data analysis, little is known about the specific operations, processes, and pain points journalists encounter while performing this tedious, time-consuming task. To better understand the needs of this user group, we conduct a technical observation study of 50 public repositories of data and analysis code authored by 33 professional journalists at 26 news organizations. We develop two detailed and cross-cutting taxonomies of data wrangling in computational journalism, for actions and for processes. We observe the extensive use of multiple tables, a notable gap in previous wrangling analyses. We develop a concise, actionable framework for general multi-table data wrangling that includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
