Beyond NDCG: behavioral testing of recommender systems with RecList

Patrick John Chia; Jacopo Tagliabue; Federico Bianchi; Chloe He; Brian; Ko

arXiv:2111.09963·cs.IR·March 29, 2022

Beyond NDCG: behavioral testing of recommender systems with RecList

Patrick John Chia, Jacopo Tagliabue, Federico Bianchi, Chloe He, Brian, Ko

PDF

3 Repos

TL;DR

RecList is a new behavioral testing framework for recommender systems that complements traditional metrics, enabling more nuanced evaluation of real-world performance across different use cases.

Contribution

It introduces a scalable, plug-and-play methodology for behavioral testing of recommender systems, applicable to both algorithms and commercial black-box systems.

Findings

01

RecList effectively analyzes known algorithms and commercial systems.

02

It reveals nuanced behaviors not captured by standard metrics.

03

Open source release facilitates community adoption.

Abstract

As with most Machine Learning systems, recommender systems are typically evaluated through performance metrics computed over held-out data points. However, real-world behavior is undoubtedly nuanced: ad hoc error analysis and deployment-specific tests must be employed to ensure the desired quality in actual deployments. In this paper, we propose RecList, a behavioral-based testing methodology. RecList organizes recommender systems by use case and introduces a general plug-and-play procedure to scale up behavioral testing. We demonstrate its capabilities by analyzing known algorithms and black-box commercial systems, and we release RecList as an open source, extensible package for the community.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsHigh-Order Consensuses