Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Catherine Yeh, Donghao Ren, Yannick Assogba, Dominik Moritz, Fred, Hohman

TL;DR
This paper presents Amplio, an interactive human-in-the-loop tool that systematically identifies and augments empty data spaces in text datasets, enhancing diversity and safety testing of machine learning models.
Contribution
Introduces Amplio, a novel tool with three augmentation techniques to help practitioners explore unknown data spaces and improve model robustness.
Findings
Amplio enables quick and creative data augmentation by red teamers.
The tool improves the diversity and relevance of safety prompts.
Red teamers found Amplio useful for exploring data gaps.
Abstract
Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these "unknown unknowns" is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate "unknown unknowns" in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment With Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Human-Automation Interaction and Safety
