Validating Synthetic Usage Data in Living Lab Environments

Timo Breuer; Norbert Fuhr; Philipp Schaer

arXiv:2310.07142·cs.IR·October 12, 2023

Validating Synthetic Usage Data in Living Lab Environments

Timo Breuer, Norbert Fuhr, Philipp Schaer

PDF

1 Repo

TL;DR

This paper presents a method to validate synthetic user interaction data generated by click models in data-sparse living lab environments, enabling reliable evaluation of retrieval systems with limited user data.

Contribution

It introduces an evaluation approach for validating click model-generated data against known system rankings in human-in-the-loop settings with sparse data.

Findings

01

Simple click models can reliably evaluate system performance with 20 sessions.

02

Complex click models need more data but perform better in simulated experiments.

03

Distinguishing between diverse systems is easier than reproducing identical rankings.

Abstract

Evaluating retrieval performance without editorial relevance judgments is challenging, but instead, user interactions can be used as relevance signals. Living labs offer a way for small-scale platforms to validate information retrieval systems with real users. If enough user interaction data are available, click models can be parameterized from historical sessions to evaluate systems before exposing users to experimental rankings. However, interaction data are sparse in living labs, and little is studied about how click models can be validated for reliable user simulations when click data are available in moderate amounts. This work introduces an evaluation approach for validating synthetic usage data generated by click models in data-sparse human-in-the-loop environments like living labs. We ground our methodology on the click model's estimates about a system ranking compared to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

irgroup/validating-synthetic-usage-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.