OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

Ziyi Wang; Yuxuan Lu; Wenbo Li; Amirali Amini; Bo Sun; Yakov Bart; Weimin Lyu; Jiri Gesi; Tian Wang; Jing Huang; Yu Su; Upol Ehsan; Malihe Alikhani; Toby Jia-Jun Li; Lydia Chilton; Dakuo Wang

arXiv:2506.05606·cs.CL·May 19, 2026

OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

Ziyi Wang, Yuxuan Lu, Wenbo Li, Amirali Amini, Bo Sun, Yakov Bart, Weimin Lyu, Jiri Gesi, Tian Wang, Jing Huang, Yu Su, Upol Ehsan, Malihe Alikhani, Toby Jia-Jun Li, Lydia Chilton, Dakuo Wang

PDF

2 Datasets

TL;DR

This paper introduces OPERA, a comprehensive dataset capturing human online shopping behavior, including actions, personas, observations, and rationales, to evaluate LLMs' ability to simulate individual user actions.

Contribution

The paper presents the first public dataset combining user personas, observations, actions, and rationales for online shopping, and establishes a benchmark for LLMs' predictive capabilities.

Findings

01

OPERA enables evaluation of LLMs in predicting user actions and rationales.

02

The dataset includes high-fidelity data collected via questionnaires and browser plugins.

03

A benchmark is established for assessing LLMs' performance in personalized behavior simulation.

Abstract

Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating ``believable'' human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.