Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction

Beicheng Bei; Hannah Hyesun Chun; Chen Guo; Arwa Saghiri

arXiv:2604.10786·cs.CL·April 14, 2026

Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction

Beicheng Bei, Hannah Hyesun Chun, Chen Guo, Arwa Saghiri

PDF

TL;DR

This paper investigates whether BERT embeddings encode narrative dimensions like time, space, causality, and character in fiction, using token-level probing and clustering analyses.

Contribution

It demonstrates that BERT encodes meaningful narrative information, but these dimensions are not represented as discrete clusters, highlighting the complexity of narrative encoding.

Findings

01

BERT embeddings achieve 94% accuracy in encoding narrative dimensions.

02

Rare categories like causality and space have lower recall (0.75 and 0.66).

03

Unsupervised clustering aligns poorly with narrative categories (ARI=0.081).

Abstract

Narrative understanding requires multidimensional semantic structures. This study investigates whether BERT embeddings encode dimensions of fictional narrative semantics -- time, space, causality, and character. Using an LLM to accelerate annotation, we construct a token-level dataset labeled with these four narrative categories plus "others." A linear probe on BERT embeddings (94% accuracy) significantly outperforms a control probe on variance-matched random embeddings (47%), confirming that BERT encodes meaningful narrative information. With balanced class weighting, the probe achieves a macro-average recall of 0.83, with moderate success on rare categories such as causality (recall = 0.75) and space (recall = 0.66). However, confusion matrix analysis reveals "Boundary Leakage," where rare dimensions are systematically misclassified as "others." Clustering analysis shows that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.