Off-policy Evaluation for Payments at Adyen
Alex Egg

TL;DR
This paper showcases how Off-Policy Evaluation can effectively accelerate recommender system development at Adyen by leveraging large-scale transaction data, reducing reliance on slow A/B testing, and providing reliable estimates for decision-making.
Contribution
It demonstrates the practical deployment of OPE in a high-volume payment environment, including benchmarking estimators and addressing real-world challenges.
Findings
Strong correlation between OPE estimates and A/B test results
Projected incremental transactions of 9-54 million over six months
Guidance on effective OPE estimator selection for large-scale systems
Abstract
This paper demonstrates the successful application of Off-Policy Evaluation (OPE) to accelerate recommender system development and optimization at Adyen, a global leader in financial payment processing. Facing the limitations of traditional A/B testing, which proved slow, costly, and often inconclusive, we integrated OPE to enable rapid evaluation of new recommender system variants using historical data. Our analysis, conducted on a billion-scale dataset of transactions, reveals a strong correlation between OPE estimates and online A/B test results, projecting an incremental 9--54 million transactions over a six-month period. We explore the practical challenges and trade-offs associated with deploying OPE in a high-volume production environment, including leveraging exploration traffic for data collection, mitigating variance in importance sampling, and ensuring scalability through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoverty, Education, and Child Welfare · Healthcare Policy and Management
