Tab-Shapley: Identifying Top-k Tabular Data Quality Insights

Manisha Padala; Lokesh Nagalapatti; Atharv Tyagi; Ramasuri Narayanam,; Shiv Kumar Saini

arXiv:2501.06685·cs.LG·January 14, 2025

Tab-Shapley: Identifying Top-k Tabular Data Quality Insights

Manisha Padala, Lokesh Nagalapatti, Atharv Tyagi, Ramasuri Narayanam,, Shiv Kumar Saini

PDF

TL;DR

This paper introduces Tab-Shapley, an efficient, unsupervised method leveraging Shapley values to identify top-k anomalous attribute sets and data quality insights in tabular datasets, addressing complex dependencies without labeled data.

Contribution

We propose a novel, game theory-based framework that efficiently computes attribute contributions to anomalies, overcoming computational challenges and capturing attribute dependencies.

Findings

01

Effective identification of top-k anomaly insights

02

Efficient closed-form Shapley value computation

03

Validated on real-world datasets with ground-truth anomalies

Abstract

We present an unsupervised method for aggregating anomalies in tabular datasets by identifying the top-k tabular data quality insights. Each insight consists of a set of anomalous attributes and the corresponding subsets of records that serve as evidence to the user. The process of identifying these insight blocks is challenging due to (i) the absence of labeled anomalies, (ii) the exponential size of the subset search space, and (iii) the complex dependencies among attributes, which obscure the true sources of anomalies. Simple frequency-based methods fail to capture these dependencies, leading to inaccurate results. To address this, we introduce Tab-Shapley, a cooperative game theory based framework that uses Shapley values to quantify the contribution of each attribute to the data's anomalous nature. While calculating Shapley values typically requires exponential time, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training