Evaluating and Adapting Large Language Models to Represent Folktales in Low-Resource Languages
JA Meaney, Beatrice Alex, William Lamb

TL;DR
This paper evaluates the effectiveness of large language models in representing folktales from low-resource languages like Irish and Gaelic, proposing adaptations to improve their performance in classification tasks.
Contribution
It introduces three adaptations to enhance LLM performance on folktale classification and compares them with baseline models, highlighting the impact of domain-specific pre-training and sequence length.
Findings
Adapting models for longer sequences improves classification accuracy.
Continued pre-training on folktale domain enhances model performance.
Baseline SVM with non-contextual features performs strongly in comparison.
Abstract
Folktales are a rich resource of knowledge about the society and culture of a civilisation. Digital folklore research aims to use automated techniques to better understand these folktales, and it relies on abstract representations of the textual data. Although a number of large language models (LLMs) claim to be able to represent low-resource langauges such as Irish and Gaelic, we present two classification tasks to explore how useful these representations are, and three adaptations to improve the performance of these models. We find that adapting the models to work with longer sequences, and continuing pre-training on the domain of folktales improves classification performance, although these findings are tempered by the impressive performance of a baseline SVM with non-contextual features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFolklore, Mythology, and Literature Studies · Digital Humanities and Scholarship · Natural Language Processing Techniques
MethodsSupport Vector Machine
