Replication-Robust Payoff-Allocation for Machine Learning Data Markets
Dongge Han, Michael Wooldridge, Alex Rogers, Olga Ohrimenko, Sebastian, Tschiatschek

TL;DR
This paper investigates how to allocate payoffs fairly in submodular function-based machine learning data markets, focusing on robustness against data replication and manipulation, and provides theoretical and empirical insights into solution stability.
Contribution
It introduces a systematic study of replication robustness in submodular games and characterizes the robustness of semivalue solution concepts, including the Shapley and Banzhaf values.
Findings
Theoretical conditions for robustness of semivalue solutions.
Replication manipulation can undermine payoff fairness.
Empirical validation on ML data markets confirms theoretical insights.
Abstract
Submodular functions have been a powerful mathematical model for a wide range of real-world applications. Recently, submodular functions are becoming increasingly important in machine learning (ML) for modelling notions such as information and redundancy among entities such as data and features. Among these applications, a key question is payoff allocation, i.e., how to evaluate the importance of each entity towards the collective objective? To this end, classic solution concepts from cooperative game theory offer principled approaches to payoff allocation. However, despite the extensive body of game-theoretic literature, payoff allocation in submodular games are relatively under-researched. In particular, an important notion that arises in the emerging submodular applications is redundancy, which may occur from various sources such as abundant data or malicious manipulations where a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security
