Towards Scalable Visual Data Wrangling via Direct Manipulation
El Kindi Rezig, Mir Mahathir Mohammad, Nicolas Baret, Ricardo Mayerhofer, Andrew McNutt, Paul Rosen

TL;DR
This paper introduces Buckaroo, a visual data wrangling system that enhances scalability and error detection in large datasets through intelligent sampling, aggregation, and interactive techniques, improving data cleaning workflows.
Contribution
Buckaroo advances visual data wrangling by enabling automatic group discovery, anomaly detection, and scalable interaction techniques tailored for large datasets.
Findings
Effective error prioritization through sampling strategies
Supports multi-layered navigation with pan-and-zoom
Demonstrates usability and scalability in large datasets
Abstract
Data wrangling, the process of cleaning, transforming, and preparing data for analysis, is a well-known bottleneck in data science workflows. A wide range of data wrangling techniques have been proposed to mitigate this challenge. Of particular interest are visual data wrangling tools, in which users prepare data via graphical interactions (such as with visualizations) rather than requiring them to write scripts. We develop a visual data wrangling system, Buckaroo, that expands upon this paradigm by enabling the automatic discovery of interesting groups (e.g., Salary values for Country="Buthan") and identification of anomalies (e.g., missing values, outliers, and type mismatches) both within and across these groups. Crucially, this allows users to reason about how repairs applied to one group affect other groups in the dataset. A central challenge in visual data wrangling is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Visualization and Analytics · Cell Image Analysis Techniques
