Selecting Sub-tables for Data Exploration
Kathy Razmadze, Yael Amsterdamer, Amit Somech, Susan B. Davidson and, Tova Milo

TL;DR
This paper introduces a framework for selecting small, informative sub-tables from large data tables to enhance data exploration, using metrics like cell coverage and diversity, with efficient algorithms validated by experiments and user feedback.
Contribution
The paper formalizes the problem of sub-table selection based on informativeness and proposes an efficient algorithm leveraging table embeddings to approximate optimal solutions.
Findings
Efficient algorithms produce high-quality sub-tables according to proposed metrics.
Sub-tables effectively capture prominent association rules and diversity.
User studies confirm the practical usefulness of the selected sub-tables.
Abstract
We present a framework for creating small, informative sub-tables of large data tables to facilitate the first step of data science: data exploration. Given a large data table table T, the goal is to create a sub-table of small, fixed dimensions, by selecting a subset of T's rows and projecting them over a subset of T's columns. The question is: which rows and columns should be selected to yield an informative sub-table? We formalize the notion of "informativeness" based on two complementary metrics: cell coverage, which measures how well the sub-table captures prominent association rules in T, and diversity. Since computing optimal sub-tables using these metrics is shown to be infeasible, we give an efficient algorithm which indirectly accounts for association rules using table embedding. The resulting framework can be used for visualizing the complete sub-table, as well as for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Data Management and Algorithms · Data Stream Mining Techniques
