TL;DR
This paper introduces FULTR, a policy-gradient method that learns fair and unbiased ranking functions from biased implicit feedback by enforcing customizable fairness constraints and utilizing counterfactual estimators.
Contribution
It presents the first approach to address both presentation bias and fairness of exposure simultaneously using a flexible, counterfactual policy-gradient algorithm.
Findings
Successfully learns fair ranking policies from biased data.
Demonstrates effectiveness of counterfactual estimators in fairness constraints.
Achieves accurate and fair rankings in empirical evaluations.
Abstract
While implicit feedback (e.g., clicks, dwell times, etc.) is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons typically manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and often lead to rich-get-richer dynamics. Moreover, even after the correction of such biases, reasons endogenous to the design of the learning algorithm can still lead to ranking policies that do not allocate exposure among items in a fair way. To address both exogenous and endogenous sources of unfairness, we present the first learning-to-rank approach that addresses both presentation bias and merit-based fairness of exposure simultaneously. Specifically, we define a class of amortized fairness-of-exposure constraints that can be chosen based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
