Statistical laws and linguistics inform meaning in naturalistic and fictional conversation

Ashley M. A. Fehr; Calla G. Beauregard; Julia Witte Zimmerman; Katie Ekstr\"om; Pablo Rosillo-Rodes; Christopher M. Danforth; Peter Sheridan Dodds

arXiv:2512.18072·cs.CL·January 1, 2026

Statistical laws and linguistics inform meaning in naturalistic and fictional conversation

Ashley M. A. Fehr, Calla G. Beauregard, Julia Witte Zimmerman, Katie Ekstr\"om, Pablo Rosillo-Rodes, Christopher M. Danforth, Peter Sheridan Dodds

PDF

Open Access

TL;DR

This paper investigates how statistical patterns like Heaps' law relate to conversation types, analyzing real and fictional dialogues to understand language feature impacts on vocabulary scaling.

Contribution

It introduces an analysis of Heaps' law in conversation, highlighting how vocabulary scaling varies by speech parts and context, which is a novel application.

Findings

01

Vocabulary scaling differs by parts of speech.

02

Conversation type influences vocabulary growth patterns.

03

Fictional and real conversations show distinct statistical behaviors.

Abstract

Conversation is a cornerstone of social connection and is linked to well-being outcomes. Conversations vary widely in type with some portion generating complex, dynamic stories. One approach to studying how conversations unfold in time is through statistical patterns such as Heaps' law, which holds that vocabulary size scales with document length. Little work on Heaps' law has looked at conversation and considered how language features impact scaling. We measure Heaps' law for conversations recorded in two distinct mediums: 1. Strangers brought together on video chat and 2. Fictional characters in movies. We find that scaling of vocabulary size differs by parts of speech. We discuss these findings through behavioral and linguistic frameworks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Authorship Attribution and Profiling · Complex Systems and Time Series Analysis