Uzbek affix finite state machine for stemming

Maksud Sharipov; Ulugbek Salaev

arXiv:2205.10078·cs.CL·May 23, 2022

Uzbek affix finite state machine for stemming

Maksud Sharipov, Ulugbek Salaev

PDF

Open Access

TL;DR

This paper introduces a finite state machine-based morphological analyzer for Uzbek that performs high-speed, lexicon-free analysis by modeling affix sequences in right-to-left order, tailored to the language's agglutinative structure.

Contribution

It presents a novel FSM-based approach for Uzbek morphology that models all word classes without lexicons, improving speed and memory efficiency.

Findings

01

High-speed morphological analysis achieved

02

No lexicon required for analysis

03

FSMs modeled for all Uzbek word classes

Abstract

This work presents a morphological analyzer for the Uzbek language using a finite state machine. The proposed methodology is a morphologic analysis of Uzbek words by using an affix striping to find a root and without including any lexicon. This method helps to perform morphological analysis of words from a large amount of text at high speed as well as it is not required using of memory for keeping vocabulary. According to Uzbek, an agglutinative language can be designed with finite state machines (FSMs). In contrast to the previous works, this study modeled the completed FSMs for all word classes by using the Uzbek language's morphotactic rules in right to left order. This paper shows the stages of this methodology including the classification of the affixes, the generation of the FSMs for each affix class, and the combination into a head machine to make analysis a word.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Fuzzy Logic and Control Systems

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings