Urdu Morphology, Orthography and Lexicon Extraction

Muhammad Humayoun; Harald Hammarstr\"om; Aarne Ranta

arXiv:2204.03071·cs.CL·April 8, 2022·51 cites

Urdu Morphology, Orthography and Lexicon Extraction

Muhammad Humayoun, Harald Hammarstr\"om, Aarne Ranta

PDF

Open Access

TL;DR

This paper presents a software API for Urdu language processing, focusing on orthography, morphology, and lexicon extraction, utilizing a reusable toolkit to facilitate applications like search, training, and syntax analysis.

Contribution

It introduces an implementation of Urdu language processing tools using a reusable morphology toolkit, enabling various linguistic applications.

Findings

01

Successful implementation of Urdu orthography and morphology modules

02

Reusable toolkit facilitates applications like keyword search and language training

03

Demonstrated basic Urdu syntax processing capabilities

Abstract

Urdu is a challenging language because of, first, its Perso-Arabic script and second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia. This paper describes an implementation of the Urdu language as a software API, and we deal with orthography, morphology and the extraction of the lexicon. The morphology is implemented in a toolkit called Functional Morphology (Forsberg & Ranta, 2004), which is based on the idea of dealing grammars as software libraries. Therefore this implementation could be reused in applications such as intelligent search of keywords, language training and infrastructure for syntax. We also present an implementation of a small part of Urdu syntax to demonstrate this reusability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Mathematics, Computing, and Information Processing