Semantic Novelty at Scale: Narrative Shape Taxonomy and Readership Prediction in 28,606 Books
W. Frederick Zimmerman

TL;DR
This paper introduces a novel measure of narrative structure called semantic novelty, analyzes 28,606 books to identify archetypal shapes, and finds that narrative dynamics significantly predict readership engagement and vary across genres and history.
Contribution
It proposes semantic novelty as a new, scalable measure of narrative structure and uncovers canonical narrative archetypes and their relation to readership and genre.
Findings
Eight canonical narrative shape archetypes identified.
Volume-variance of novelty predicts readership engagement.
Genre and historical trends influence narrative shape and predictability.
Abstract
I introduce semantic novelty--cosine distance between each paragraph's sentence embedding and the running centroid of all preceding paragraphs--as an information-theoretic measure of narrative structure at corpus scale. Applying it to 28,606 books in PG19 (pre-1920 English literature), I compute paragraph-level novelty curves using 768-dimensional SBERT embeddings, then reduce each to a 16-segment Piecewise Aggregate Approximation (PAA). Ward-linkage clustering on PAA vectors reveals eight canonical narrative shape archetypes, from Steep Descent (rapid convergence) to Steep Ascent (escalating unpredictability). Volume--variance of the novelty trajectory--is the strongest length-independent predictor of readership (partial rho = 0.32), followed by speed (rho = 0.19) and Terminal/Initial ratio (rho = 0.19). Circuitousness shows strong raw correlation (rho = 0.41) but is 93 percent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedia Influence and Health · Narrative Theory and Analysis · Computational and Text Analysis Methods
