To See or To Read: User Behavior Reasoning in Multimodal LLMs

Tianning Dong; Luyi Ma; Varun Vasudevan; Jason Cho; Sushant Kumar; Kannan Achan

arXiv:2511.03845·cs.AI·November 7, 2025

To See or To Read: User Behavior Reasoning in Multimodal LLMs

Tianning Dong, Luyi Ma, Varun Vasudevan, Jason Cho, Sushant Kumar, Kannan Achan

PDF

Open Access

TL;DR

This paper introduces BehaviorLens, a benchmarking framework that compares textual and image representations of user behavior data in multimodal LLMs, revealing that image-based data significantly improves next-purchase prediction accuracy.

Contribution

The paper presents BehaviorLens, a systematic framework for evaluating modality trade-offs in user-behavior reasoning across multiple MLLMs using real-world purchase data.

Findings

01

Image representations improve prediction accuracy by 87.5%.

02

Text and image modalities have distinct advantages for user behavior reasoning.

03

Benchmarking reveals modality trade-offs in MLLMs.

Abstract

Multimodal Large Language Models (MLLMs) are reshaping how modern agentic systems reason over sequential user-behavior data. However, whether textual or image representations of user behavior data are more effective for maximizing MLLM performance remains underexplored. We present \texttt{BehaviorLens}, a systematic benchmarking framework for assessing modality trade-offs in user-behavior reasoning across six MLLMs by representing transaction data as (1) a text paragraph, (2) a scatter plot, and (3) a flowchart. Using a real-world purchase-sequence dataset, we find that when data is represented as images, MLLMs next-purchase prediction accuracy is improved by 87.5% compared with an equivalent textual representation without any additional computational cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)