SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

Yuqi Yang; Weiqi Wang; Baixuan Xu; Wei Fan; Qing Zong; Chunkit Chan; Zheye Deng; Xin Liu; Yifan Gao; Changlong Yu; Chen Luo; Yang Li; Zheng Li; Qingyu Yin; Bing Yin; Yangqiu Song

arXiv:2507.20185·cs.CL·April 13, 2026

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding

Yuqi Yang, Weiqi Wang, Baixuan Xu, Wei Fan, Qing Zong, Chunkit Chan, Zheye Deng, Xin Liu, Yifan Gao, Changlong Yu, Chen Luo, Yang Li, Zheng Li, Qingyu Yin, Bing Yin, Yangqiu Song

PDF

TL;DR

This paper introduces SessionIntentBench, a large multimodal dataset and benchmark for modeling and understanding customer intention shifts across e-commerce sessions, highlighting current models' limitations and potential improvements.

Contribution

It presents a new intention tree concept, a scalable dataset with extensive annotations, and evaluates L(V)LMs' capabilities in inter-session intention modeling.

Findings

01

Current L(V)LMs struggle to capture intention shifts.

02

Injecting intention information improves LLM performance.

03

The dataset includes over 1.9 million intention entries and 13 million tasks.

Abstract

Session history is a common way of recording user interacting behaviors throughout a browsing activity with multiple products. For example, if an user clicks a product webpage and then leaves, it might because there are certain features that don't satisfy the user, which serve as an important indicator of on-the-spot user preferences. However, all prior works fail to capture and model customer intention effectively because insufficient information exploitation and only apparent information like descriptions and titles are used. There is also a lack of data and corresponding benchmark for explicitly modeling intention in E-commerce product purchase sessions. To address these issues, we introduce the concept of an intention tree and propose a dataset curation pipeline. Together, we construct a sibling multimodal benchmark, SessionIntentBench, that evaluates L(V)LMs' capability on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.