Fair Document Valuation in LLM Summaries via Shapley Values
Zikun Ye, Hema Yoganarasimhan

TL;DR
This paper proposes a Shapley value-based framework for fair document attribution in LLM summaries, introducing Cluster Shapley as an efficient approximation method that outperforms standard approaches in accuracy and fairness.
Contribution
It introduces Cluster Shapley, a novel, structure-aware approximation algorithm for fair document valuation in LLM-generated summaries, improving efficiency and accuracy.
Findings
Cluster Shapley outperforms Monte Carlo and Kernel SHAP in efficiency and accuracy.
Simple attribution rules are often highly unfair in LLM summarization.
Structure-aware approximations are promising for scalable fair content attribution.
Abstract
Large Language Models (LLMs) are increasingly used in systems that retrieve and summarize content from multiple sources, such as search engines and AI assistants. While these systems enhance user experience through coherent summaries, they obscure the individual contributions of original content creators, raising concerns about credit attribution and compensation. We address the challenge of valuing individual documents used in LLM-generated summaries by proposing a Shapley value-based framework for fair document valuation. Although theoretically appealing, exact Shapley value computation is prohibitively expensive at scale. To improve efficiency, we develop Cluster Shapley, a simple approximation algorithm that leverages semantic similarity among documents to reduce computation while maintaining attribution accuracy. Using Amazon product review data, we empirically show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivate Equity and Venture Capital · Financial Reporting and Valuation Research · Financial Reporting and XBRL
MethodsShapley Additive Explanations
