LLMs for automatic annotation of Mandarin narrative transcripts
Qingwen Zhao, Hongao Zhu, Yunqi He, Rui Wang, Aijun Huang, Hai Hu

TL;DR
This study assesses the effectiveness of Large Language Models in automating discourse-level annotation of Mandarin narratives, demonstrating promising accuracy and efficiency gains but also highlighting ongoing challenges.
Contribution
It provides the first systematic evaluation of LLMs for macrostructure annotation in non-English spoken narratives, with open-sourced prompt templates.
Findings
Best LLM achieved agreement with human raters (k=.794)
Annotation time reduced by 65% with LLMs
Model reliability decreased on narratives with greater lexical variation
Abstract
Linguistic annotation of transcribed speech is essential for research in language acquisition, language disorders, and sociolinguistics, yet remains labor-intensive and time-consuming. While Large Language Models (LLMs) have shown promise in automating annotation tasks, their ability to handle complex discourse-level annotation in non-English languages remains understudied. This study evaluates whether LLMs can reliably annotate narrative macrostructure-the hierarchical organization of story grammar elements-in spoken Mandarin, using the Multilingual Assessment Instrument for Narratives (MAIN) as a testbed. We compared four LLMs against trained human annotators on narratives produced by children, young adults, and older adults. The best-performing model achieved agreement with human raters (k=.794) approaching human-human reliability levels (k=.872) while reducing annotation time by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
