DeepPersona: A Generative Engine for Scaling Deep Synthetic Personas
Zhen Wang, Yufan Zhou, Zhongyan Luo, Lyumanshan Ye, Adam Wood, Man Yao, Saab Mansour, and Luoshang Pan

TL;DR
DeepPersona introduces a scalable generative engine that creates highly detailed and diverse synthetic human personas, significantly advancing the realism and utility of AI simulations and personalization.
Contribution
It develops the largest human-attribute taxonomy and a two-stage sampling method to generate deep, narrative-rich personas with hundreds of attributes.
Findings
32% higher attribute coverage than baselines
44% greater profile uniqueness
11.6% improvement in personalized question answering
Abstract
Simulating human profiles by instilling personas into large language models (LLMs) is rapidly transforming research in agentic behavioral simulation, LLM personalization, and human-AI alignment. However, most existing synthetic personas remain shallow and simplistic, capturing minimal attributes and failing to reflect the rich complexity and diversity of real human identities. We introduce DEEPPERSONA, a scalable generative engine for synthesizing narrative-complete synthetic personas through a two-stage, taxonomy-guided method. First, we algorithmically construct the largest-ever human-attribute taxonomy, comprising over hundreds of hierarchically organized attributes, by mining thousands of real user-ChatGPT conversations. Second, we progressively sample attributes from this taxonomy, conditionally generating coherent and realistic personas that average hundreds of structured…
Peer Reviews
Decision·Submitted to ICLR 2026
I cannot find any strength or contribution in the current manuscript of the paper.
The writing is misleading and difficult to follow, suggesting that the paper is not yet in a finished state. In the abstract, the authors claim to be “mining thousands of real user–ChatGPT conversations.” However, Section 3 shows that no new data were mined; instead, the work relies entirely on existing datasets (Puffin, prefeval_implicit_persona). This inconsistency significantly weakens the claimed novelty. At the start of Section 3, several important concepts are introduced without any expl
- The synthetic persona generation problem is interesting and timely. - They provide a large-scale taxonomy of human attributes, which is really beneficial for the literature. - The experiments are good, showing the advantages of the generated synthetic persona. - The authors provide a method to diversify selected attributes.
- The method is naive. They did break the sampling procedure into two stages (sampling attributes from the taxonomy first and then sampling values from the given attributes), but they are heavily manually engineered. - Generally, it seems to be a neat paper and can bring benefits to the community, but the novelty is limited. It would be more appreciated if this paper were submitted to the benchmark and dataset tracks instead of the main tracks.
- The motivation is clear and important, pinpointing the problem of "persona depth" in previous persona generation approaches. - The method the authors use to extract is systematic and thoughtful. - Evaluation is done extensively in a multi-faceted manner, ranging from four different downstream tasks. - Experiments are conducted on many frontier AI models from different sources, further supporting the generality of this work. - Human experiments are included to complement the possible concerns
I did not spot any significant weaknesses in this paper. One minor regret would be that qualitative examples are limited. It would be great to see qualitative comparisons between previous approaches and DeepPersona. Also, this is minor, but there are some formatting issues on page 23. Please amend the overflow issue.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Machine Learning in Healthcare · Topic Modeling
