Comparing Overlapping Data Distributions Using Visualization
Eric Newburger, Niklas Elmqvist

TL;DR
This study compares how different data visualizations help non-experts determine if two samples come from the same distribution, finding that simpler idealized normal curves improve accuracy over more complex visualizations.
Contribution
It provides empirical evidence on which visualization types best support novices in making graphical inferences about overlapping data distributions.
Findings
Idealized normal curves increase accuracy in identifying similar distributions.
More abstract visualizations like histograms and boxplots are less effective for novices.
The study offers insights into visualization design for data interpretation tasks.
Abstract
We present results from a preregistered and crowdsourced user study where we asked members of the general population to determine whether two samples represented using different forms of data visualizations are drawn from the same or different populations. Such a task reduces to assessing whether the overlap between the two visualized samples is large enough to suggest similar or different origins. When using idealized normal curves fitted on the samples, it is essentially a graphical formulation of the classic Student's t-test. However, we speculate that using more sophisticated visual representations, such as bar histograms, Wilkinson dot plots, strip plots, or Tukey boxplots will both allow people to be more accurate at this task as well as better understand its meaning. In other words, the purpose of our study is to explore which visualization best scaffolds novices in making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Data Analysis with R · Statistics Education and Methodologies
