RankSHAP: Shapley Value Based Feature Attributions for Learning to Rank
Tanya Chowdhury, Yair Zick, James Allan

TL;DR
RankSHAP introduces a game-theoretic, axiomatic approach to feature attribution in learning to rank, extending Shapley values to improve interpretability and consistency across ranking models.
Contribution
The paper proposes RankSHAP, a novel axiomatic extension of Shapley values for ranking, and evaluates its effectiveness and alignment with human intuition.
Findings
RankSHAP aligns well with human intuition in user studies.
It satisfies fundamental axioms for ranking feature attribution.
Experimental results show improved consistency across models.
Abstract
Numerous works propose post-hoc, model-agnostic explanations for learning to rank, focusing on ordering entities by their relevance to a query through feature attribution methods. However, these attributions often weakly correlate or contradict each other, confusing end users. We adopt an axiomatic game-theoretic approach, popular in the feature attribution community, to identify a set of fundamental axioms that every ranking-based feature attribution method should satisfy. We then introduce Rank-SHAP, extending classical Shapley values to ranking. We evaluate the RankSHAP framework through extensive experiments on two datasets, multiple ranking methods and evaluation metrics. Additionally, a user study confirms RankSHAP's alignment with human intuition. We also perform an axiomatic analysis of existing rank attribution algorithms to determine their compliance with our proposed axioms.…
Peer Reviews
Decision·ICLR 2025 Poster
- Axiomatic Foundation: I appreciate that the authors propose a set of fundamental axioms specifically tailored for ranking feature attributions, drawing inspiration from Shapley values in coalitional game theory. These axioms, namely Rank-Efficiency, Rank-Missingness, Rank-Symmetry, and Rank-Monotonicity, ensure that the attributions are fair, consistent, and meaningful. - Generalized Ranking Effectiveness Metric (GREM): The authors introduce GREM, a generalized framework for evaluating the eff
- User study caveats: (a) Preconceived Notions: The authors observed that randomly generated feature attributions achieved a higher concordance score in the re-ordering task than expected based on their metric evaluation. This suggests that participants might have relied on pre-existing assumptions or biases about the topics, potentially influencing their judgments throughout the experiment. Is that a drawback of the setup that also impacts rest of the observations? (b) Subjectivity: The authors
The introduction of axioms specific to ranking provides a robust framework, distinguishing RankSHAP from other feature attribution methods. The authors incorporate a user study to validate that RankSHAP explanations align with human understanding, which adds practical value.
RankSHAP’s reliance on relevance scores for accurate NDCG calculations could be a limitation in scenarios where relevance is difficult to quantify or subjective. Although RankSHAP was tested in a user study, the evaluation might have limited generalizability due to sample size.
- The paper introduces a thoughtful adaptation of the Shapley value for the ranking domain, defining new ranking-specific properties that enhance SHAP's applicability in ranking contexts. - The authors have conducted both performance evaluations and a user study to validate their method.
The proposed method, RankSHAP, heavily relies on the KernelSHAP method [1] with modifications to incorporate NDCG for ranking applications. Rather than introducing a fundamentally new method, the paper adapts an existing approach specifically for ranking tasks. While the axiomatic reformulation is valuable, the technical novelty beyond extending KernelSHAP with NDCG remains limited. [1] Scott M Lundberg and Su-In Lee, A Unified Approach to Interpreting Model Predictions, in NeurIPS, 2017.
Videos
Taxonomy
TopicsMulti-Criteria Decision Making
MethodsSparse Evolutionary Training · Focus
