Building Morphological Chains for Agglutinative Languages
Serkan Ozen, Burcu Can

TL;DR
This paper enhances unsupervised morphological segmentation for agglutinative languages by expanding candidate generation in a log-linear model, significantly improving segmentation accuracy for Turkish and English.
Contribution
It introduces an extended candidate generation method for the MorphoChains model, boosting segmentation performance on agglutinative languages.
Findings
12% improvement in Turkish F-measure to 72%
3% improvement in English F-measure to 74%
Outperforms existing unsupervised segmentation systems
Abstract
In this paper, we build morphological chains for agglutinative languages by using a log-linear model for the morphological segmentation task. The model is based on the unsupervised morphological segmentation system called MorphoChains. We extend MorphoChains log linear model by expanding the candidate space recursively to cover more split points for agglutinative languages such as Turkish, whereas in the original model candidates are generated by considering only binary segmentation of each word. The results show that we improve the state-of-art Turkish scores by 12% having a F-measure of 72% and we improve the English scores by 3% having a F-measure of 74%. Eventually, the system outperforms both MorphoChains and other well-known unsupervised morphological segmentation systems. The results indicate that candidate generation plays an important role in such an unsupervised log-linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
