Shapley Homology: Topological Analysis of Sample Influence for Neural Networks
Kaixuan Zhang, Qinglong Wang, Xue Liu, C. Lee Giles

TL;DR
This paper introduces Shapley Homology, a framework that quantifies how individual data samples influence the topological features of data manifolds, aiding understanding of sample importance in neural network training.
Contribution
The paper proposes a novel topological influence measure based on homology and Shapley values, providing a new perspective on data sample impact beyond traditional methods.
Findings
Samples with higher influence scores affect neural network accuracy more.
Higher entropy in data manifolds correlates with increased learning difficulty.
Influence scores relate to the impact on graph connectivity and grammar learning.
Abstract
Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (iid). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley Homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. By interpreting the influence as a probability measure, we further define an entropy which reflects the complexity of the data manifold. Our empirical studies show that when using the 0-dimensional homology, on neighboring graphs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Cell Image Analysis Techniques · Advanced Graph Neural Networks
