Estranged Predictions: Measuring Semantic Category Disruption with Masked Language Modelling
Yuxuan Liu, Haim Dubossarsky, Ruth Ahnert

TL;DR
This study uses masked language models to quantify how science fiction destabilizes traditional ontological categories like human, animal, and machine, revealing genre-specific semantic permeability and conceptual slippage.
Contribution
It introduces a novel computational approach to measure ontological disruption in science fiction using MLMs and three new metrics, advancing literary analysis methods.
Findings
Science fiction shows increased semantic permeability around machine referents.
Human terms tend to retain semantic coherence and hierarchical structure.
MLMs can detect genre-conditioned ontological assumptions through probabilistic modelling.
Abstract
This paper examines how science fiction destabilises ontological categories by measuring conceptual permeability across the terms human, animal, and machine using masked language modelling (MLM). Drawing on corpora of science fiction (Gollancz SF Masterworks) and general fiction (NovelTM), we operationalise Darko Suvin's theory of estrangement as computationally measurable deviation in token prediction, using RoBERTa to generate lexical substitutes for masked referents and classifying them via Gemini. We quantify conceptual slippage through three metrics: retention rate, replacement rate, and entropy, mapping the stability or disruption of category boundaries across genres. Our findings reveal that science fiction exhibits heightened conceptual permeability, particularly around machine referents, which show significant cross-category substitution and dispersion. Human terms, by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Narrative Theory and Analysis · Computational and Text Analysis Methods
