TL;DR
This paper introduces a D-optimal design approach for online recommender selection, improving exploration efficiency by maximizing information gain, with demonstrated effectiveness through simulations and real-world deployment on Walmart.com.
Contribution
It adapts D-optimal design to online recommender systems, addressing limitations of existing bandit methods by leveraging structure-specific information gain maximization.
Findings
Enhanced exploration efficiency demonstrated in simulations.
Improved recommendation performance in Walmart.com deployment.
Reproducible results with published code and data.
Abstract
Selecting the optimal recommender via online exploration-exploitation is catching increasing attention where the traditional A/B testing can be slow and costly, and offline evaluations are prone to the bias of history data. Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward. While the problem can be formalized via the lens of multi-armed bandits, the existing solutions are found less satisfactorily because the general methodologies do not account for the case-specific structures, particularly for the e-commerce recommendation we study. To fill in the gap, we leverage the \emph{D-optimal design} from the classical statistics literature to achieve the maximum information gain during exploration, and reveal how it fits seamlessly with the modern infrastructure of online…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
