Automated Motif Indexing on the Arabian Nights
Ibrahim H. Alyami, Mark A. Finlayson

TL;DR
This paper introduces the first computational method for motif indexing in folklore texts, using the Arabian Nights and a detailed motif index, achieving high accuracy with fine-tuned language models.
Contribution
It presents a novel automated approach to motif indexing, leveraging a large annotated corpus and various NLP techniques, including fine-tuned LLMs, to improve motif detection.
Findings
Fine-tuned Llama3 achieves 0.85 F1 score.
Multiple approaches tested, with fine-tuned LLMs performing best.
Annotated corpus of 58,450 sentences created for training and testing.
Abstract
Motifs are non-commonplace, recurring narrative elements, often found originally in folk stories. In addition to being of interest to folklorists, motifs appear as metaphoric devices in modern news, literature, propaganda, and other cultural texts. Finding expressions of motifs in the original folkloristic text is useful for both folkloristic analysis (motif indexing) as well as for understanding the modern usage of motifs (motif detection and interpretation). Prior work has primarily shown how difficult these problems are to tackle using automated techniques. We present the first computational approach to motif indexing. Our choice of data is a key enabler: we use a large, widely available text (the Arabian Nights) paired with a detailed motif index (by El-Shamy in 2006), which overcomes the common problem of inaccessibility of texts referred to by the index. We created a manually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFolklore, Mythology, and Literature Studies · Artificial Intelligence in Games · Language and cultural evolution
