SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

Jianhong Li; Zeheng Qian; Wangze Ni; Haoyang Li; Hongwei Yao; Yang Bai; Kui Ren

arXiv:2604.09553·cs.IR·April 14, 2026

SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

Jianhong Li, Zeheng Qian, Wangze Ni, Haoyang Li, Hongwei Yao, Yang Bai, Kui Ren

PDF

TL;DR

SRBench is a new comprehensive benchmark for evaluating sequential recommendation models, especially LLM-based ones, across multiple real-world relevant dimensions like fairness, stability, and efficiency.

Contribution

It introduces a multi-dimensional evaluation framework, a unified prompt-based input paradigm, and a novel answer extraction mechanism for fair comparison of SR models.

Findings

01

LLM-SR models overfocus on item popularity.

02

SRBench enables fair, multi-dimensional assessment of SR models.

03

Evaluation of 13 models reveals insights into LLM-SR capabilities.

Abstract

LLM development has aroused great interest in Sequential Recommendation (SR) applications. However, comprehensive evaluation of SR models remains lacking due to the limitations of the existing benchmarks: 1) an overemphasis on accuracy, ignoring other real-world demands (e.g., fairness); 2) existing datasets fail to unleash LLMs' potential, leading to unfair comparison between Neural-Network-based SR (NN-SR) models and LLM-based SR (LLM-SR) models; and 3) no reliable mechanism for extracting task-specific answers from unstructured LLM outputs. To address these limitations, we propose SRBench, a comprehensive SR benchmark with three core designs: 1) a multi-dimensional framework covering accuracy, fairness, stability and efficiency, aligned with practical demands; 2) a unified input paradigm via prompt engineering to boost LLM-SR performance and enable fair comparisons between models; 3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.