Toward Pareto Efficient Fairness-Utility Trade-off inRecommendation through Reinforcement Learning
Yingqiang Ge, Xiaoting Zhao, Lucia Yu, Saurabh Paul, Diane Hu,, Chu-Cheng Hsieh, Yongfeng Zhang

TL;DR
This paper introduces MoFIR, a multi-objective reinforcement learning framework that learns Pareto efficient recommendation policies balancing fairness and utility, adaptable to different business preferences.
Contribution
Proposes MoFIR, a novel reinforcement learning approach with a conditioned network to efficiently approximate the Pareto frontier in fairness-aware recommendation.
Findings
MoFIR outperforms baselines on fairness and recommendation metrics.
It effectively generalizes the Pareto frontier for different preferences.
Experimental results confirm the superiority of MoFIR on real datasets.
Abstract
The issue of fairness in recommendation is becoming increasingly essential as Recommender Systems touch and influence more and more people in their daily lives. In fairness-aware recommendation, most of the existing algorithmic approaches mainly aim at solving a constrained optimization problem by imposing a constraint on the level of fairness while optimizing the main recommendation objective, e.g., CTR. While this alleviates the impact of unfair recommendations, the expected return of an approach may significantly compromise the recommendation accuracy due to the inherent trade-off between fairness and utility. This motivates us to deal with these conflicting objectives and explore the optimal trade-off between them in recommendation. One conspicuous approach is to seek a Pareto efficient solution to guarantee optimal compromises between utility and fairness. Moreover, considering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Experience Replay · Batch Normalization · Adam · Weight Decay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Deep Deterministic Policy Gradient
