Reinforced Approximate Exploratory Data Analysis

Shaddy Garg; Subrata Mitra; Tong Yu; Yash Gadhia; Arjun Kashettiwar

arXiv:2212.06225·cs.LG·December 14, 2022·1 cites

Reinforced Approximate Exploratory Data Analysis

Shaddy Garg, Subrata Mitra, Tong Yu, Yash Gadhia, Arjun Kashettiwar

PDF

Open Access 1 Video

TL;DR

This paper introduces a deep reinforcement learning framework to optimize sampling strategies in exploratory data analysis, balancing low latency with preservation of analytical insights during interactive data exploration.

Contribution

It is the first to model sampling in interactive data exploration as an optimization problem using DRL, improving efficiency without losing insights.

Findings

01

Preserves insight flow during analysis

02

Reduces interaction latency significantly

03

Outperforms baseline sampling methods

Abstract

Exploratory data analytics (EDA) is a sequential decision making process where analysts choose subsequent queries that might lead to some interesting insights based on the previous queries and corresponding results. Data processing systems often execute the queries on samples to produce results with low latency. Different downsampling strategy preserves different statistics of the data and have different magnitude of latency reductions. The optimum choice of sampling strategy often depends on the particular context of the analysis flow and the hidden intent of the analyst. In this paper, we are the first to consider the impact of sampling in interactive data exploration settings as they introduce approximation errors. We propose a Deep Reinforcement Learning (DRL) based framework which can optimize the sample selection in order to keep the analysis and insight generation flow intact.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforced Approximate Exploratory Data Analysis· underline

Taxonomy

TopicsData Management and Algorithms · Data Stream Mining Techniques · Time Series Analysis and Forecasting