Large language model as user daily behavior data generator: balancing population diversity and individual personality
Haoxin Li, Jingtao Ding, Jiahui Gong, Yong Li

TL;DR
This paper introduces BehaviorGen, a framework using large language models to generate synthetic human behavior data, improving prediction accuracy while addressing privacy concerns.
Contribution
The paper presents a novel LLM-based framework for generating high-quality synthetic behavior data, supporting data augmentation and replacement for behavior prediction models.
Findings
Achieved up to 18.9% improvement in prediction accuracy.
Demonstrated effectiveness in mobility and smartphone usage scenarios.
Supported flexible data augmentation and replacement strategies.
Abstract
Predicting human daily behavior is challenging due to the complexity of routine patterns and short-term fluctuations. While data-driven models have improved behavior prediction by leveraging empirical data from various platforms and devices, the reliance on sensitive, large-scale user data raises privacy concerns and limits data availability. Synthetic data generation has emerged as a promising solution, though existing methods are often limited to specific applications. In this work, we introduce BehaviorGen, a framework that uses large language models (LLMs) to generate high-quality synthetic behavior data. By simulating user behavior based on profiles and real events, BehaviorGen supports data augmentation and replacement in behavior prediction models. We evaluate its performance in scenarios such as pertaining augmentation, fine-tuning replacement, and fine-tuning augmentation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
