ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests

Jingyuan He; Jiongnan Liu; Vishan Vishesh Oberoi; Bolin Wu; Mahima Jagadeesh Patel; Kangrui Mao; Chuning Shi; I-Ta Lee; Arnold Overwijk; Chenyan Xiong

arXiv:2510.26095·cs.IR·October 31, 2025

ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests

Jingyuan He, Jiongnan Liu, Vishan Vishesh Oberoi, Bolin Wu, Mahima Jagadeesh Patel, Kangrui Mao, Chuning Shi, I-Ta Lee, Arnold Overwijk, Chenyan Xiong

PDF

TL;DR

ORBIT is a comprehensive benchmark for evaluating recommender systems, featuring standardized datasets, a new webpage recommendation task, and a hidden test to assess model generalization, highlighting current limitations and future potential.

Contribution

This paper introduces ORBIT, a unified, reproducible benchmark with a novel webpage recommendation task and hidden test, addressing evaluation inconsistencies in recommender system research.

Findings

01

General improvements observed on public datasets.

02

Variable performance across different models.

03

LLM baseline shows potential in large-scale webpage recommendation.

Abstract

Recommender systems are among the most impactful AI applications, interacting with billions of users every day, guiding them to relevant products, services, or information tailored to their preferences. However, the research and development of recommender systems are hindered by existing datasets that fail to capture realistic user behaviors and inconsistent evaluation settings that lead to ambiguous conclusions. This paper introduces the Open Recommendation Benchmark for Reproducible Research with HIdden Tests (ORBIT), a unified benchmark for consistent and realistic evaluation of recommendation models. ORBIT offers a standardized evaluation framework of public datasets with reproducible splits and transparent settings for its public leaderboard. Additionally, ORBIT introduces a new webpage recommendation task, ClueWeb-Reco, featuring web browsing sequences from 87 million public,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.