Controlling False Discoveries During Interactive Data Exploration
Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig,, Eli Upfal, Tim Kraska

TL;DR
This paper introduces Aware, a system that integrates new multiple hypothesis testing control procedures into interactive data exploration tools to reduce false discoveries, addressing limitations of existing methods.
Contribution
It proposes novel control procedures tailored for interactive exploration and integrates them into the Aware system to improve false discovery management.
Findings
Aware effectively controls false discoveries in interactive exploration.
The new procedures outperform existing FDR methods in this context.
Experiments show improved reliability for both experts and novices.
Abstract
Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. The crux is that these tools implicitly allow the user to test a large body of different hypotheses with just a few clicks thus incurring in the issue commonly known in statistics as the multiple hypothesis testing error. In this paper, we propose solutions to integrate multiple hypothesis testing control into interactive data exploration tools. A key insight is that existing methods for controlling the false discovery rate (such as FDR) are not directly applicable for interactive data exploration. We therefore discuss a set of new control procedures that are better suited and integrated them in our system called Aware. By means of extensive experiments using both real-world and synthetic data sets we demonstrate how Aware can help experts and novice users alike to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Time Series Analysis and Forecasting · Sports Analytics and Performance
