Arabic Morphosyntactic Tagging and Dependency Parsing with Large Language Models
Mohamed Adel, Bashar Alhafni, Nizar Habash

TL;DR
This paper evaluates instruction-tuned large language models on Arabic morphosyntactic tagging and dependency parsing, revealing their strengths and limitations in capturing Arabic's complex morphology and syntax through zero-shot and retrieval-based prompting.
Contribution
It demonstrates the effectiveness of prompt design and retrieval-based in-context learning in improving Arabic NLP tasks with LLMs, highlighting their potential and current challenges.
Findings
Proprietary models approach supervised baselines in feature tagging.
Retrieval-based ICL improves parsing and tokenization.
Tokenization remains challenging in raw-text settings.
Abstract
Large language models (LLMs) perform strongly on many NLP tasks, but their ability to produce explicit linguistic structure remains unclear. We evaluate instruction-tuned LLMs on two structured prediction tasks for Standard Arabic: morphosyntactic tagging and labeled dependency parsing. Arabic provides a challenging testbed due to its rich morphology and orthographic ambiguity, which create strong morphology-syntax interactions. We compare zero-shot prompting with retrieval-based in-context learning (ICL) using examples from Arabic treebanks. Results show that prompt design and demonstration selection strongly affect performance: proprietary models approach supervised baselines for feature-level tagging and become competitive with specialized dependency parsers. In raw-text settings, tokenization remains challenging, though retrieval-based ICL improves both parsing and tokenization. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
