Mining Asymmetric Intertextuality
Pak Kin Lau, Stuart Michael McManus

TL;DR
This paper presents a scalable, adaptive method for mining asymmetric intertextual relationships in texts, utilizing LLM-assisted normalization, vector similarity, and verification to handle explicit and implicit references across large, evolving corpora.
Contribution
It introduces a novel split-normalize-merge paradigm for detecting asymmetric intertextuality, suitable for dynamic and large-scale literary and historical datasets.
Findings
Effective detection of explicit and implicit intertextual links.
Scalable approach adaptable to growing corpora.
Utilizes LLMs for metadata extraction and verification.
Abstract
This paper introduces a new task in Natural Language Processing (NLP) and Digital Humanities (DH): Mining Asymmetric Intertextuality. Asymmetric intertextuality refers to one-sided relationships between texts, where one text cites, quotes, or borrows from another without reciprocation. These relationships are common in literature and historical texts, where a later work references aclassical or older text that remain static. We propose a scalable and adaptive approach for mining asymmetric intertextuality, leveraging a split-normalize-merge paradigm. In this approach, documents are split into smaller chunks, normalized into structured data using LLM-assisted metadata extraction, and merged during querying to detect both explicit and implicit intertextual relationships. Our system handles intertextuality at various levels, from direct quotations to paraphrasing and cross-document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
