Simpson's Paradox in Recommender Fairness: Reconciling differences   between per-user and aggregated evaluations

Flavien Prost; Ben Packer; Jilin Chen; Li Wei; Pierre Kremp; Nicholas; Blumm; Susan Wang; Tulsee Doshi; Tonia Osadebe; Lukasz Heldt; Ed H. Chi; Alex; Beutel

arXiv:2210.07755·cs.IR·October 17, 2022

Simpson's Paradox in Recommender Fairness: Reconciling differences between per-user and aggregated evaluations

Flavien Prost, Ben Packer, Jilin Chen, Li Wei, Pierre Kremp, Nicholas, Blumm, Susan Wang, Tulsee Doshi, Tonia Osadebe, Lukasz Heldt, Ed H. Chi, Alex, Beutel

PDF

Open Access

TL;DR

This paper investigates Simpson's Paradox in recommender system fairness evaluations, revealing how per-user and aggregated fairness metrics can contradict each other, and proposes a distribution matching method to reconcile and estimate these metrics in practice.

Contribution

It identifies the paradoxical divergence between per-user and aggregated fairness evaluations and introduces a distribution matching technique to estimate fairness metrics under partial observability.

Findings

01

Per-user and aggregated fairness metrics can lead to opposite conclusions.

02

Distribution matching effectively estimates fairness metrics with partial data.

03

The approach works on both simulated and real-world recommender data.

Abstract

There has been a flurry of research in recent years on notions of fairness in ranking and recommender systems, particularly on how to evaluate if a recommender allocates exposure equally across groups of relevant items (also known as provider fairness). While this research has laid an important foundation, it gave rise to different approaches depending on whether relevant items are compared per-user/per-query or aggregated across users. Despite both being established and intuitive, we discover that these two notions can lead to opposite conclusions, a form of Simpson's Paradox. We reconcile these notions and show that the tension is due to differences in distributions of users where items are relevant, and break down the important factors of the user's recommendations. Based on this new understanding, practitioners might be interested in either notions, but might face challenges with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Voting Systems · Decision-Making and Behavioral Economics · Recommender Systems and Techniques