Data-Semantics-Aware Recommendation of Diverse Pivot Tables
Whanhee Cho, Anna Fariha

TL;DR
This paper introduces SAGE, a system that automatically recommends diverse, insightful pivot tables by leveraging data semantics, addressing the challenge of high-dimensional data summarization and improving over prior methods in diversity and scalability.
Contribution
The paper proposes a data-semantics-aware model and a scalable greedy algorithm for recommending diverse, high-utility pivot tables, advancing automated data summarization techniques.
Findings
SAGE outperforms alternative approaches in experiments.
SAGE efficiently scales to high-dimensional datasets.
Case studies demonstrate qualitative improvements over existing tools.
Abstract
Data summarization is essential to discover insights from large datasets. In a spreadsheets, pivot tables offer a convenient way to summarize tabular data by computing aggregates over some attributes, grouped by others. However, identifying attribute combinations that will result in useful pivot tables remains a challenge, especially for high-dimensional datasets. We formalize the problem of automatically recommending insightful and interpretable pivot tables, eliminating the tedious manual process. A crucial aspect of recommending a set of pivot tables is to diversify them. Traditional works inadequately address the table-diversification problem, which leads us to consider the problem of pivot table diversification. We present SAGE, a data-semantics-aware system for recommending k-budgeted diverse pivot tables, overcoming the shortcomings of prior work for top-k recommendations that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Recommender Systems and Techniques · Spreadsheets and End-User Computing
