Uzbek affix finite state machine for stemming
Maksud Sharipov, Ulugbek Salaev

TL;DR
This paper introduces a finite state machine-based morphological analyzer for Uzbek that performs high-speed, lexicon-free analysis by modeling affix sequences in right-to-left order, tailored to the language's agglutinative structure.
Contribution
It presents a novel FSM-based approach for Uzbek morphology that models all word classes without lexicons, improving speed and memory efficiency.
Findings
High-speed morphological analysis achieved
No lexicon required for analysis
FSMs modeled for all Uzbek word classes
Abstract
This work presents a morphological analyzer for the Uzbek language using a finite state machine. The proposed methodology is a morphologic analysis of Uzbek words by using an affix striping to find a root and without including any lexicon. This method helps to perform morphological analysis of words from a large amount of text at high speed as well as it is not required using of memory for keeping vocabulary. According to Uzbek, an agglutinative language can be designed with finite state machines (FSMs). In contrast to the previous works, this study modeled the completed FSMs for all word classes by using the Uzbek language's morphotactic rules in right to left order. This paper shows the stages of this methodology including the classification of the affixes, the generation of the FSMs for each affix class, and the combination into a head machine to make analysis a word.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Fuzzy Logic and Control Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
