A Comprehensive Study of Shapley Value in Data Analytics
Hong Lin, Shixin Wan, Zhongle Xie, Ke Chen, Meihui Zhang, Lidan Shou, Gang Chen

TL;DR
This paper provides a comprehensive analysis of Shapley value applications in data analytics, addressing key challenges, evaluating techniques, and introducing an open-source framework for further research and development.
Contribution
It offers the first detailed study of Shapley value in data analytics, clarifies key variables, challenges, and solutions, and introduces SVBench, a modular framework for SV applications.
Findings
Identifies four main challenges: computation efficiency, approximation error, privacy, interpretability.
Analyzes and compares existing techniques for each challenge.
Highlights limitations and future directions for applying SV in data analytics.
Abstract
Over the recent years, Shapley value (SV), a solution concept from cooperative game theory, has found numerous applications in data analytics (DA). This paper presents the first comprehensive study of SV used throughout the DA workflow, clarifying the key variables in defining DA-applicable SV and the essential functionalities that SV can provide for data scientists. We condense four primary challenges of using SV in DA, namely computation efficiency, approximation error, privacy preservation, and interpretability, disentangle the resolution techniques from existing arts in this field, then analyze and discuss the techniques w.r.t. each challenge and the potential conflicts between challenges.We also implement SVBench, a modular and extensible open-source framework for developing SV applications in different DA tasks, and conduct extensive evaluations to validate our analyses and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
