Aggregation Consistency Errors in Semantic Layers and How to Avoid Them

Zezhou Huang; Pavan Kalyan Damalapati; Eugene Wu

arXiv:2307.00417·cs.DB·July 4, 2023

Aggregation Consistency Errors in Semantic Layers and How to Avoid Them

Zezhou Huang, Pavan Kalyan Damalapati, Eugene Wu

PDF

TL;DR

This paper addresses aggregation consistency errors caused by joins in semantic layers and proposes a weighing method with human-in-the-loop to improve metric accuracy and interpretability.

Contribution

It introduces a weighing primitive to ensure aggregation consistency in semantic layers and presents a human-in-the-loop framework for strategy exploration.

Findings

01

Weighing effectively prevents double counting in join fanouts.

02

The human-in-the-loop approach allows iterative refinement of weighing strategies.

03

The method improves accuracy and interpretability of aggregated metrics.

Abstract

Analysts often struggle with analyzing data from multiple tables in a database due to their lack of knowledge on how to join and aggregate the data. To address this, data engineers pre-specify "semantic layers" which include the join conditions and "metrics" of interest with aggregation functions and expressions. However, joins can cause "aggregation consistency issues". For example, analysts may observe inflated total revenue caused by double counting from join fanouts. Existing BI tools rely on heuristics for deduplication, resulting in imprecise and challenging-to-understand outcomes. To overcome these challenges, we propose "weighing" as a core primitive to counteract join fanouts. "Weighing" has been used in various areas, such as market attribution and order management, ensuring metrics consistency (e.g., total revenue remains the same) even for many-to-many joins. The idea is to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.