Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning
Akhila Vangara, Alex Egg

TL;DR
This paper investigates the use of regression oracles in non-uniform exploration for payment processing, revealing performance improvements but also challenges like policy oscillation and data shift issues.
Contribution
It provides a detailed analysis of regression oracle-based approaches in real-world payment systems, highlighting their benefits and limitations within the ERM framework.
Findings
Regression oracles improve initial policy performance.
Policy performance can degrade over iterations due to data shifts.
Oscillation effects can cause fluctuations in policy effectiveness.
Abstract
Uniform random exploration in decision-making systems supports off-policy learning via supervision but incurs high regret, making it impractical for many applications. Conversely, non-uniform exploration offers better immediate performance but lacks support for off-policy learning. Recent research suggests that regression oracles can bridge this gap by combining non-uniform exploration with supervised learning. In this paper, we analyze these approaches within a real-world industrial context at Adyen, a large global payments processor characterized by batch logged delayed feedback, short-term memory, and dynamic action spaces under the Empirical Risk Minimization (ERM) framework. Our analysis reveals that while regression oracles significantly improve performance, they introduce challenges due to rigid algorithmic assumptions. Specifically, we observe that as a policy improves,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCustomer churn and segmentation · Organizational and Employee Performance · Technology Adoption and User Behaviour
