Large language model as user daily behavior data generator: balancing population diversity and individual personality

Haoxin Li; Jingtao Ding; Jiahui Gong; Yong Li

arXiv:2505.17615·cs.LG·May 26, 2025

Large language model as user daily behavior data generator: balancing population diversity and individual personality

Haoxin Li, Jingtao Ding, Jiahui Gong, Yong Li

PDF

TL;DR

This paper introduces BehaviorGen, a framework using large language models to generate synthetic human behavior data, improving prediction accuracy while addressing privacy concerns.

Contribution

The paper presents a novel LLM-based framework for generating high-quality synthetic behavior data, supporting data augmentation and replacement for behavior prediction models.

Findings

01

Achieved up to 18.9% improvement in prediction accuracy.

02

Demonstrated effectiveness in mobility and smartphone usage scenarios.

03

Supported flexible data augmentation and replacement strategies.

Abstract

Predicting human daily behavior is challenging due to the complexity of routine patterns and short-term fluctuations. While data-driven models have improved behavior prediction by leveraging empirical data from various platforms and devices, the reliance on sensitive, large-scale user data raises privacy concerns and limits data availability. Synthetic data generation has emerged as a promising solution, though existing methods are often limited to specific applications. In this work, we introduce BehaviorGen, a framework that uses large language models (LLMs) to generate high-quality synthetic behavior data. By simulating user behavior based on profiles and real events, BehaviorGen supports data augmentation and replacement in behavior prediction models. We evaluate its performance in scenarios such as pertaining augmentation, fine-tuning replacement, and fine-tuning augmentation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.