Joint Diacritization, Lemmatization, Normalization, and Fine-Grained   Morphological Tagging

Nasser Zalmout; Nizar Habash

arXiv:1910.02267·cs.CL·October 8, 2019

Joint Diacritization, Lemmatization, Normalization, and Fine-Grained Morphological Tagging

Nasser Zalmout, Nizar Habash

PDF

TL;DR

This paper presents a joint modeling approach for diacritization, lemmatization, normalization, and morphological tagging in Semitic languages, improving accuracy especially for dialectal Arabic.

Contribution

It introduces a unified model that handles lexicalized and non-lexicalized features at different granularities, achieving state-of-the-art results for Arabic.

Findings

01

20% relative error reduction for Modern Standard Arabic

02

11% error reduction for Egyptian Arabic

03

Effective joint modeling of multiple morphological features

Abstract

Semitic languages can be highly ambiguous, having several interpretations of the same surface forms, and morphologically rich, having many morphemes that realize several morphological features. This is further exacerbated for dialectal content, which is more prone to noise and lacks a standard orthography. The morphological features can be lexicalized, like lemmas and diacritized forms, or non-lexicalized, like gender, number, and part-of-speech tags, among others. Joint modeling of the lexicalized and non-lexicalized features can identify more intricate morphological patterns, which provide better context modeling, and further disambiguate ambiguous lexical choices. However, the different modeling granularity can make joint modeling more difficult. Our approach models the different features jointly, whether lexicalized (on the character-level), where we also model surface form…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest