Semantic Novelty Trajectories in 80,000 Books: A Cross-Corpus Embedding Analysis
Fred Zimmerman

TL;DR
This study analyzes semantic novelty trajectories in over 80,000 English books spanning two centuries, revealing increased modern novelty, more circuitous trajectories, and differing narrative structures, with novelty independent of reader ratings.
Contribution
It applies corpus-scale semantic analysis using embedding trajectories to compare historical and modern literature, uncovering structural differences in narrative novelty.
Findings
Modern books have higher paragraph-level novelty.
Trajectory circuitousness nearly doubles in modern corpus.
Pre-1920 literature more often shows convergent narrative curves.
Abstract
I apply Schmidhuber's compression progress theory of interestingness at corpus scale, analyzing semantic novelty trajectories in more than 80,000 books spanning two centuries of English-language publishing. Using sentence-transformer paragraph embeddings and a running-centroid novelty measure, I compare 28,730 pre-1920 Project Gutenberg books (PG19) against 52,796 modern English books (Books3, approximately 1990-2010). The principal findings are fourfold. First, mean paragraph-level novelty is roughly 10% higher in modern books (0.503 vs. 0.459). Second, trajectory circuitousness -- the ratio of cumulative path length to net displacement in embedding space -- nearly doubles in the modern corpus (+67%). Third, convergent narrative curves, in which novelty declines toward a settled semantic register, are 2.3x more common in pre-1920 literature. Fourth, novelty is orthogonal to reader…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Narrative Theory and Analysis · Media Influence and Health
