An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation

Haoyu Han; Li Ma; Hanbing Wang; Bingheng Li; Daochen Zha; Chun How Tan; Huiji Gao; Xin Liu; Stephanie Moyerman; Sanjeev Katariya; Hui Liu; Jiliang Tang

arXiv:2605.07125·cs.IR·May 11, 2026

An Embarrassingly Simple Graph Heuristic Reveals Shortcut-Solvable Benchmarks for Sequential Recommendation

Haoyu Han, Li Ma, Hanbing Wang, Bingheng Li, Daochen Zha, Chun How Tan, Huiji Gao, Xin Liu, Stephanie Moyerman, Sanjeev Katariya, Hui Liu, Jiliang Tang

PDF

TL;DR

A simple graph heuristic can outperform complex models on recommendation benchmarks, revealing that current datasets may not require advanced modeling and highlighting the need for better evaluation practices.

Contribution

The paper introduces a straightforward graph-based heuristic that challenges the assumption that high benchmark performance indicates advanced modeling capabilities.

Findings

01

The heuristic matches or outperforms many modern baselines on several datasets.

02

Shortcut structures in datasets can make simple methods highly effective.

03

Benchmark performance may not reflect true model complexity or capability.

Abstract

Sequential recommendation has increasingly shifted toward generative recommenders that combine sequential patterns with semantic item information. Yet these methods are often evaluated on a small set of widely used benchmarks, raising a key question: do these benchmarks actually require the advanced modeling capabilities that modern generative recommenders claim to provide? We conduct a benchmark audit with an intentionally simple graph heuristic. Starting from only the last one or two interacted items, it retrieves candidates from a few-hop item-transition graph and ranks them by item-feature similarity. Despite using no sequence encoder, generative objective, or training, this heuristic matches or outperforms many modern baselines, with relative NDCG@10 improvements of 38.10% and 44.18% over the best competing baseline on Amazon Review Sports and CDs. We show that this behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.