Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems
Timo Wilm, Philipp Normann

TL;DR
This paper presents a pragmatic, model-agnostic strategy to identify offline metrics that reliably predict online impact in recommender systems, validated through large-scale e-commerce experiments.
Contribution
It introduces a novel approach using Pareto front approximation to align offline metrics with online performance, applicable across various neural network-based systems.
Findings
Offline metrics significantly correlate with click-through rate.
The strategy enables simultaneous testing across multiple groups.
Validated on OTTO e-commerce platform with positive results.
Abstract
A critical challenge in recommender systems is to establish reliable relationships between offline and online metrics that predict real-world performance. Motivated by recent advances in Pareto front approximation, we introduce a pragmatic strategy for identifying offline metrics that align with online impact. A key advantage of this approach is its ability to simultaneously serve multiple test groups, each with distinct offline performance metrics, in an online experiment controlled by a single model. The method is model-agnostic for systems with a neural network backbone, enabling broad applicability across architectures and domains. We validate the strategy through a large-scale online experiment in the field of session-based recommender systems on the OTTO e-commerce platform. The online experiment identifies significant alignments between offline metrics and real-word click-through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
