Synthia: Scalable Grounded Persona Generation from Social Media Data
Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, Yadollah Yaghoobzadeh

TL;DR
Synthia is a scalable framework that generates authentic, high-fidelity personas grounded in social media data, improving alignment with human opinions and fairness across demographics.
Contribution
It introduces a novel method for constructing scalable, realistic personas grounded in social media, with enhanced fairness and preservation of social network structures.
Findings
Outperforms prior methods in aligning with human opinion distributions.
Shows improved fairness across most demographic groups.
Preserves social network structures enabling network-aware analysis.
Abstract
Persona-driven simulations are increasingly used in computational social science, yet their validity critically depends on the fidelity of the underlying personas. Constructing virtual populations that are both authentic and scalable remains a central challenge. We introduce Synthia, a persona-generation framework that grounds LLM-generated personas in real social-media posts while delegating narrative construction to language models, using publicly available data from the Bluesky platform. Across multiple social-survey benchmarks, Synthia improves alignment with human opinion distributions over prior state-of-the-art approaches while relying on substantially smaller models. A multi-dimensional fairness and bias analysis shows that Synthia outperforms previous methods for most demographics across different dimensions. Uniquely, Synthia preserves interaction-graph structure among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
