UserGPT Technical Report

Yunyi Xuan; Hao Yi; Fengling Mao; Daye Cai; Leikun Liang; Xingsheng He; Jiangnan Xie; Guoshuai Wang; Yushan Han; Wenwen Guo; Xiaoxiao Xu; Lin Qu

arXiv:2605.08766·cs.IR·May 12, 2026

UserGPT Technical Report

Yunyi Xuan, Hao Yi, Fengling Mao, Daye Cai, Leikun Liang, Xingsheng He, Jiangnan Xie, Guoshuai Wang, Yushan Han, Wenwen Guo, Xiaoxiao Xu, Lin Qu

PDF

TL;DR

This paper introduces UserGPT, a framework leveraging large language models for coherent user persona understanding from noisy behavioral data, enhanced by simulation, data transformation, and advanced training strategies.

Contribution

It proposes a novel generative approach with a comprehensive pipeline, including a simulation engine, data semantization, and curriculum-driven fine-tuning, to improve personalized user profiling.

Findings

01

UserGPT achieves an Avg@10 score of 0.7325 on tag prediction.

02

UserGPT compresses behavioral data by up to 97.9% while preserving critical information.

03

The framework demonstrates effective holistic persona reasoning on the HPR-Bench benchmark.

Abstract

Personalized user understanding from large-scale digital traces remains a fundamental challenge. Traditional user profiling methods rely on discriminative models and manual feature engineering to predict discrete attributes, often producing fragmented and logically inconsistent profiles that generalize poorly to long-tail behaviors. In this work, we study a generative paradigm in which large language models (LLMs) summarize long and noisy behavioral histories into coherent narratives that capture nuanced user evolution. Our experiments show that even strong LLMs remain limited in complex and implicit personalization reasoning. We propose UserGPT, a framework for improving LLM-based persona understanding through both attribute generation and summary generation. To address the scarcity of real-world behavioral data, we develop a User Behavior Simulation Engine that produces realistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.