Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages
Olga Kellert, Nemika Tyagi, Muhammad Imran, Nelvin Licona-Guevara, Carlos G\'omez-Rodr\'iguez

TL;DR
This paper introduces the BiLingua Parser, an LLM-based system for generating Universal Dependencies annotations in code-switched text, demonstrating high accuracy and providing new annotated datasets for low-resource language pairs.
Contribution
It presents a novel prompt-based framework for UD annotation of code-switched languages, including the first Spanish-Guaraní UD corpus, and shows LLMs can effectively analyze complex multilingual syntax.
Findings
Achieves up to 95.29% LAS after expert review
Outperforms prior baselines and multilingual parsers
Provides new annotated datasets for low-resource language pairs
Abstract
Code-switching presents a complex challenge for syntactic analysis, especially in low-resource language settings where annotated data is scarce. While recent work has explored the use of large language models (LLMs) for sequence-level tagging, few approaches systematically investigate how well these models capture syntactic structure in code-switched contexts. Moreover, existing parsers trained on monolingual treebanks often fail to generalize to multilingual and mixed-language input. To address this gap, we introduce the BiLingua Parser, an LLM-based annotation pipeline designed to produce Universal Dependencies (UD) annotations for code-switched text. First, we develop a prompt-based framework for Spanish-English and Spanish-Guaran\'i data, combining few-shot LLM prompting with expert review. Second, we release two annotated datasets, including the first Spanish-Guaran\'i UD-parsed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Multilingual Education and Policy · ICT in Developing Communities
