Evaluating Search System Explainability with Psychometrics and Crowdsourcing
Catherine Chen, Carsten Eickhoff

TL;DR
This paper introduces a psychometric and crowdsourcing-based approach to evaluate the explainability of web search systems, proposing a new metric called SSE that correlates with interpretability.
Contribution
It presents SSE, a novel human-centered evaluation metric for explainable IR, and demonstrates its effectiveness through a crowdsourced user study.
Findings
SSE can effectively distinguish between explainable and non-explainable systems.
Higher SSE scores correlate with greater interpretability in search systems.
The approach provides a blueprint for explainability evaluation in other AI domains.
Abstract
As information retrieval (IR) systems, such as search engines and conversational agents, become ubiquitous in various domains, the need for transparent and explainable systems grows to ensure accountability, fairness, and unbiased results. Despite recent advances in explainable AI and IR techniques, there is no consensus on the definition of explainability. Existing approaches often treat it as a singular notion, disregarding the multidimensional definition postulated in the literature. In this paper, we use psychometrics and crowdsourcing to identify human-centered factors of explainability in Web search systems and introduce SSE (Search System Explainability), an evaluation metric for explainable IR (XIR) search systems. In a crowdsourced user study, we demonstrate SSE's ability to distinguish between explainable and non-explainable systems, showing that systems with higher scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Meta-analysis and systematic reviews
