# Towards Operational Validation of LLM-Agent Social Simulations: A Replicated Study of a Reddit-like Technology Forum

**Authors:** Aleksandar Toma\v{s}evi\'c, Darja Cvetkovi\'c, Sara Major, Slobodan Maleti\'c, Miroslav An{\dj}elkovi\'c, Ana Vrani\'c, Boris Stupovski, Du\v{s}an Vudragovi\'c, Aleksandar Bogojevi\'c, Marija Mitrovi\'c Dankulov

arXiv: 2508.21740 · 2026-04-27

## TL;DR

This study validates LLM-agent social simulations by comparing 30 simulated Reddit-like forums with real data across multiple social and content dimensions, revealing both similarities and divergences.

## Contribution

It provides the first systematic, multi-dimensional validation of LLM-agent social simulations in a platform-faithful environment, highlighting areas for future enhancement.

## Key findings

- Simulated activity patterns closely match real data within 99% confidence intervals.
- Simulated networks exhibit core-periphery structure similar to real networks.
- Toxicity levels differ across content layers, with simulated root posts more toxic and comments less toxic than real data.

## Abstract

Validation of LLM-agent social simulations remains underdeveloped, with most studies relying on subjective assessments or single runs. We address this gap by running 30 independent 30-day simulations of a technology forum modeled on Voat's v/technology, using stateless Dolphin Mistral 24B agents on the Y Social platform, and evaluating operational validity across five dimensions: activity patterns, network structure, toxicity, topical coverage, and stylistic convergence. Against 30 matched, non-overlapping 30-day Voat comparison windows, results show overlapping 99% confidence intervals for unique users, root posts, and daily active users, while comments, average thread length, and mean toxicity remain higher in simulation. Both simulated and empirical networks exhibit core-periphery structure, though simulated cores are larger and more diffuse and repeated interactions are less frequent. Topic alignment is near-complete, but toxicity is misallocated across content layers: simulated root posts are substantially more toxic than real submissions, while simulated comments are less toxic than Voat comments. These findings demonstrate that LLM agents in platform-faithful environments can reproduce familiar online regularities, while systematic divergences, particularly those linked to stateless agent design and content-layer calibration, point to concrete directions for future improvement.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21740/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21740/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/2508.21740/full.md

---
Source: https://tomesphere.com/paper/2508.21740