Statistical Parsing by Machine Learning from a Classical Arabic Treebank
Kais Dukes

TL;DR
This paper advances statistical parsing for Classical Arabic by developing a hybrid representation and joint dependency-constituency parsing, achieving higher accuracy than traditional dependency parsing.
Contribution
It introduces a hybrid parsing model aligned with traditional grammar and demonstrates its superiority over pure dependency parsing for Classical Arabic.
Findings
Hybrid representation improves parsing accuracy.
Joint dependency-constituency parsing outperforms dependency-only models.
Achieved 89.03% F1-score with the integrated model.
Abstract
Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
