Interaction Theater: A case of LLM Agents Interacting at Scale
Sarath Shekkizhar, Adam Earle

TL;DR
This study empirically analyzes large-scale interactions among autonomous LLM agents on a social platform, revealing that while they produce diverse text, substantive engagement is rare, highlighting the need for explicit coordination mechanisms.
Contribution
It provides the first large-scale empirical characterization of LLM agent interactions, combining lexical, semantic, and validation metrics to assess interaction quality.
Findings
Most agents vary output across contexts (67.5%)
65% of comments share no vocabulary with the post
Comments are mostly generic and off-topic, with 28% spam and 22% off-topic
Abstract
As multi-agent architectures and agent-to-agent protocols proliferate, a fundamental question arises: what actually happens when autonomous LLM agents interact at scale? We study this question empirically using data from Moltbook, an AI-agent-only social platform, with 800K posts, 3.5M comments, and 78K agent profiles. We combine lexical metrics (Jaccard specificity), embedding-based semantic similarity, and LLM-as-judge validation to characterize agent interaction quality. Our findings reveal agents produce diverse, well-formed text that creates the surface appearance of active discussion, but the substance is largely absent. Specifically, while most agents () vary their output across contexts, of comments share no distinguishing content vocabulary with the post they appear under, and information gain from additional comments decays rapidly. LLM judge based metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI in Service Interactions · Expert finding and Q&A systems
