Extracting linguistic speech patterns of Japanese fictional characters using subword units
Mika Kishino, Kanako Komiya

TL;DR
This paper proposes a subword unit-based segmentation method for analyzing Japanese anime and game characters' speech patterns, outperforming traditional morphological analyzers in capturing character-specific expressions.
Contribution
It introduces a novel subword unit segmentation approach for character speech analysis, addressing limitations of existing morphological analyzers in anime and game dialogues.
Findings
Subword units reveal character-specific linguistic patterns.
The proposed method outperforms conventional segmentation in classification tasks.
Analysis shows gender, age, and character-specific speech features.
Abstract
This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
