Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking

Qing Zhang; Alex Deng; Michelle Du; Huiji Gao; Liwei He; Sanjeev Katariya

arXiv:2508.00751·cs.IR·August 4, 2025

Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking

Qing Zhang, Alex Deng, Michelle Du, Huiji Gao, Liwei He, Sanjeev Katariya

PDF

TL;DR

This paper introduces interleaving and counterfactual evaluation techniques to improve the efficiency and sensitivity of search ranking experiments on Airbnb, enabling faster and more accurate candidate selection for A/B testing.

Contribution

It presents novel evaluation methods that significantly enhance the speed and accuracy of online ranking assessments, addressing limitations of traditional A/B testing.

Findings

01

Increased experiment sensitivity by up to 100 times

02

Reduced time and cost for candidate evaluation

03

Provided practical insights for production deployment

Abstract

Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems. It enables online platforms to create user-friendly features that drive commercial success in a steady and effective manner. The online environment is particularly conducive to applying causal inference techniques, such as randomized controlled experiments (known as A/B test), which are often more challenging to implement in fields like medicine and public policy. However, businesses face unique challenges when it comes to effective A/B test. Specifically, achieving sufficient statistical power for conversion-based metrics can be time-consuming, especially for significant purchases like booking accommodations. While offline evaluations are quicker and more cost-effective, they often lack accuracy and are inadequate for selecting candidates for A/B test. To address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.