Data Overvaluation Attack and Truthful Data Valuation in Federated Learning

Shuyuan Zheng; Sudong Cai; Chuan Xiao; Yang Cao; Jianbin Qin; Masatoshi Yoshikawa; Makoto Onizuka

arXiv:2502.00494·cs.CR·May 27, 2025

Data Overvaluation Attack and Truthful Data Valuation in Federated Learning

Shuyuan Zheng, Sudong Cai, Chuan Xiao, Yang Cao, Jianbin Qin, Masatoshi Yoshikawa, Makoto Onizuka

PDF

Open Access

TL;DR

This paper identifies vulnerabilities in federated learning data valuation methods, introduces a strategic attack to overvalue client data, and proposes a Bayesian truthful valuation metric that incentivizes honest reporting.

Contribution

It presents the data overvaluation attack in federated learning and introduces Truth-Shapley, a novel Bayesian data valuation metric that promotes truthful client data reporting.

Findings

01

Existing valuation metrics are vulnerable to overvaluation attacks.

02

Truth-Shapley effectively incentivizes truthful data contribution.

03

Experimental results confirm the robustness of Truth-Shapley.

Abstract

In collaborative machine learning (CML), data valuation, i.e., evaluating the contribution of each client's data to the machine learning model, has become a critical task for incentivizing and selecting positive data contributions. However, existing studies often assume that clients engage in data valuation truthfully, overlooking the practical motivation for clients to exaggerate their contributions. To unlock this threat, this paper introduces the data overvaluation attack, enabling strategic clients to have their data significantly overvalued in federated learning, a widely adopted paradigm for decentralized CML. Furthermore, we propose a Bayesian truthful data valuation metric, named Truth-Shapley. Truth-Shapley is the unique metric that guarantees some promising axioms for data valuation while ensuring that clients' optimal strategy is to perform truthful data valuation under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Credit Risk and Financial Regulations · Probability and Risk Models