Guided Exploration of Data Summaries
Brit Youngmann, Sihem Amer-Yahia, and Aur\'elien Personnaz

TL;DR
This paper introduces Eda4Sum, a guided, sequential approach to data summarization that improves over traditional one-shot methods by using reinforcement learning and utility maximization, especially for large, diverse datasets.
Contribution
It formalizes the Eda4Sum problem, proposing two novel methods—Top1Sum and RLSum—that enhance data summarization through guided exploration and reinforcement learning.
Findings
RLSum outperforms one-shot summarization in large datasets
Guided exploration improves diversity and utility of summaries
Reinforcement learning effectively trains summarization policies
Abstract
Data summarization is the process of producing interpretable and representative subsets of an input dataset. It is usually performed following a one-shot process with the purpose of finding the best summary. A useful summary contains k individually uniform sets that are collectively diverse to be representative. Uniformity addresses interpretability and diversity addresses representativity. Finding such as summary is a difficult task when data is highly diverse and large. We examine the applicability of Exploratory Data Analysis (EDA) to data summarization and formalize Eda4Sum, the problem of guided exploration of data summaries that seeks to sequentially produce connected summaries with the goal of maximizing their cumulative utility. EdA4Sum generalizes one-shot summarization. We propose to solve it with one of two approaches: (i) Top1Sum which chooses the most useful summary at each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Semantic Web and Ontologies
