Morphological Analysis of Japanese Hiragana Sentences using the BI-LSTM   CRF Model

Jun Izutsu; Kanako Komiya

arXiv:2201.03366·cs.CL·January 11, 2022

Morphological Analysis of Japanese Hiragana Sentences using the BI-LSTM CRF Model

Jun Izutsu, Kanako Komiya

PDF

Open Access

TL;DR

This paper introduces a neural morphological analyzer for Japanese Hiragana sentences using a Bi-LSTM CRF model, addressing the challenges posed by limited information and lack of word delimiters.

Contribution

It presents a novel approach employing fine-tuning of a Bi-LSTM CRF model specifically for Hiragana, and analyzes the impact of different training data genres.

Findings

01

Fine-tuning improves morphological analysis accuracy for Hiragana sentences.

02

Training data genre significantly affects model performance.

03

The method outperforms baseline approaches on Hiragana text analysis.

Abstract

This study proposes a method to develop neural models of the morphological analyzer for Japanese Hiragana sentences using the Bi-LSTM CRF model. Morphological analysis is a technique that divides text data into words and assigns information such as parts of speech. This technique plays an essential role in downstream applications in Japanese natural language processing systems because the Japanese language does not have word delimiters between words. Hiragana is a type of Japanese phonogramic characters, which is used for texts for children or people who cannot read Chinese characters. Morphological analysis of Hiragana sentences is more difficult than that of ordinary Japanese sentences because there is less information for dividing. For morphological analysis of Hiragana sentences, we demonstrated the effectiveness of fine-tuning using a model based on ordinary Japanese text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsConditional Random Field