Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries
Eric Simon, Bernd Amann, Rutian Liu, St\'ephane Gan\c{c}arski

TL;DR
This paper develops a formal framework with conditions and rules to ensure the correctness of aggregation queries during interactive data analysis sessions, enhancing self-service BI tools to prevent semantically incorrect aggregations.
Contribution
It introduces aggregable properties and generalized summarizability conditions to formally detect and prevent incorrect aggregation operations in analytic queries.
Findings
Defines aggregable properties for attributes in analytic tables.
Introduces generalized summarizability conditions for post-operation attributes.
Provides propagation rules to maintain correctness of aggregations through query transformations.
Abstract
We present a comprehensive set of conditions and rules to control the correctness of aggregation queries within an interactive data analysis session. The goal is to extend self-service data preparation and BI tools to automatically detect semantically incorrect aggregate queries on analytic tables and views built by using the common analytic operations including filter, project, join, aggregate, union, difference, and pivot. We introduce aggregable properties to describe for any attribute of an analytic table which aggregation functions correctly aggregates the attribute along which sets of dimension attributes. These properties can also be used to formally identify attributes which are summarizable with respect to some aggregation function along a given set of dimension attributes. This is particularly helpful to detect incorrect aggregations of measures obtained through the use of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Data Stream Mining Techniques
