Rethinking Data Value: Asymmetric Data Shapley for Structure-Aware Valuation in Data Markets and Machine Learning Pipelines

Xi Zheng; Yinghui Huang; Xiangyu Chang; Ruoxi Jia; Yong Tan

arXiv:2511.12863·cs.GT·November 18, 2025

Rethinking Data Value: Asymmetric Data Shapley for Structure-Aware Valuation in Data Markets and Machine Learning Pipelines

Xi Zheng, Yinghui Huang, Xiangyu Chang, Ruoxi Jia, Yong Tan

PDF

Open Access

TL;DR

This paper introduces Asymmetric Data Shapley (ADS), a new data valuation method that accounts for directional and temporal dependencies in modern ML/AI workflows, improving fairness and accuracy over classical approaches.

Contribution

ADS relaxes the symmetry assumption of classical Data Shapley, enabling structure-aware valuation that respects data dependencies and order in complex ML pipelines.

Findings

01

ADS outperforms benchmark methods in directional and temporal settings.

02

ADS distinguishes between novel and redundant data contributions.

03

ADS maintains efficiency and linearity while capturing data dependencies.

Abstract

Rigorous valuation of individual data sources is critical for fair compensation in data markets, informed data acquisition, and transparent development of ML/AI models. Classical Data Shapley (DS) provides a essential axiomatic framework for data valuation but is constrained by its symmetry axiom that assumes interchangeability of data sources. This assumption fails to capture the directional and temporal dependencies prevalent in modern ML/AI workflows, including the reliance of duplicated or augmented data on original sources and the order-specific contributions in sequential pipelines such as federated learning and multi-stage LLM fine tuning. To address these limitations, we introduce Asymmetric Data Shapley (ADS), a structure-aware data valuation framework for modern ML/AI pipelines. ADS relaxes symmetry by averaging marginal contributions only over permutations consistent with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Data Quality and Management