Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive   Disambiguation: the eXtended Revised AraMorph (XRAM)

Giuliano Lancioni; Valeria Pettinari; Laura Garofalo; Marta; Campanelli; Ivana Pepe; Simona Olivieri; Ilaria Cicola

arXiv:1603.01833·cs.CL·March 8, 2016·1 cites

Semi-Automatic Data Annotation, POS Tagging and Mildly Context-Sensitive Disambiguation: the eXtended Revised AraMorph (XRAM)

Giuliano Lancioni, Valeria Pettinari, Laura Garofalo, Marta, Campanelli, Ivana Pepe, Simona Olivieri, Ilaria Cicola

PDF

Open Access

TL;DR

This paper introduces XRAM, an improved Arabic lexical resource that enhances coverage, disambiguation, and semi-automatic annotation, demonstrating high success in practical testing.

Contribution

XRAM extends AraMorph with probabilistic POS tagging, context-sensitive disambiguation, and semi-automatic lexical expansion, addressing previous limitations.

Findings

01

High success rate in practical testing

02

Improved coverage of Classical and contemporary Arabic

03

Enhanced disambiguation and tagging accuracy

Abstract

An extended, revised form of Tim Buckwalter's Arabic lexical and morphological resource AraMorph, eXtended Revised AraMorph (henceforth XRAM), is presented which addresses a number of weaknesses and inconsistencies of the original model by allowing a wider coverage of real-world Classical and contemporary (both formal and informal) Arabic texts. Building upon previous research, XRAM enhancements include (i) flag-selectable usage markers, (ii) probabilistic mildly context-sensitive POS tagging, filtering, disambiguation and ranking of alternative morphological analyses, (iii) semi-automatic increment of lexical coverage through extraction of lexical and morphological information from existing lexical resources. Testing of XRAM through a front-end Python module showed a remarkable success level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies