Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
Qing Zhang, Alex Deng, Michelle Du, Huiji Gao, Liwei He, Sanjeev Katariya

TL;DR
This paper introduces interleaving and counterfactual evaluation techniques to improve the efficiency and sensitivity of search ranking experiments on Airbnb, enabling faster and more accurate candidate selection for A/B testing.
Contribution
It presents novel evaluation methods that significantly enhance the speed and accuracy of online ranking assessments, addressing limitations of traditional A/B testing.
Findings
Increased experiment sensitivity by up to 100 times
Reduced time and cost for candidate evaluation
Provided practical insights for production deployment
Abstract
Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems. It enables online platforms to create user-friendly features that drive commercial success in a steady and effective manner. The online environment is particularly conducive to applying causal inference techniques, such as randomized controlled experiments (known as A/B test), which are often more challenging to implement in fields like medicine and public policy. However, businesses face unique challenges when it comes to effective A/B test. Specifically, achieving sufficient statistical power for conversion-based metrics can be time-consuming, especially for significant purchases like booking accommodations. While offline evaluations are quicker and more cost-effective, they often lack accuracy and are inadequate for selecting candidates for A/B test. To address these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
