LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

Baptiste Bonin; Maxime Heuillet; Audrey Durand

arXiv:2511.04541·cs.IR·November 7, 2025

LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

Baptiste Bonin, Maxime Heuillet, Audrey Durand

PDF

Open Access

TL;DR

This paper explores the potential of large language models to serve as world models for user preferences in slate recommendation systems, using pairwise reasoning to improve recommendation accuracy across multiple datasets.

Contribution

It demonstrates how LLMs can be employed as effective world models for user preferences in slate recommendation, with empirical evidence across various tasks and datasets.

Findings

01

LLMs show promise as world models for user preferences

02

Performance varies based on preference function properties

03

Potential for improving recommendation systems using LLMs

Abstract

Modeling user preferences across domains remains a key challenge in slate recommendation (i.e. recommending an ordered sequence of items) research. We investigate how Large Language Models (LLM) can effectively act as world models of user preferences through pairwise reasoning over slates. We conduct an empirical study involving several LLMs on three tasks spanning different datasets. Our results reveal relationships between task performance and properties of the preference function captured by LLMs, hinting towards areas for improvement and highlighting the potential of LLMs as world models in recommender systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Sentiment Analysis and Opinion Mining · Expert finding and Q&A systems