Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction
Beicheng Bei, Hannah Hyesun Chun, Chen Guo, Arwa Saghiri

TL;DR
This paper investigates whether BERT embeddings encode narrative dimensions like time, space, causality, and character in fiction, using token-level probing and clustering analyses.
Contribution
It demonstrates that BERT encodes meaningful narrative information, but these dimensions are not represented as discrete clusters, highlighting the complexity of narrative encoding.
Findings
BERT embeddings achieve 94% accuracy in encoding narrative dimensions.
Rare categories like causality and space have lower recall (0.75 and 0.66).
Unsupervised clustering aligns poorly with narrative categories (ARI=0.081).
Abstract
Narrative understanding requires multidimensional semantic structures. This study investigates whether BERT embeddings encode dimensions of fictional narrative semantics -- time, space, causality, and character. Using an LLM to accelerate annotation, we construct a token-level dataset labeled with these four narrative categories plus "others." A linear probe on BERT embeddings (94% accuracy) significantly outperforms a control probe on variance-matched random embeddings (47%), confirming that BERT encodes meaningful narrative information. With balanced class weighting, the probe achieves a macro-average recall of 0.83, with moderate success on rare categories such as causality (recall = 0.75) and space (recall = 0.66). However, confusion matrix analysis reveals "Boundary Leakage," where rare dimensions are systematically misclassified as "others." Clustering analysis shows that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
