Mind Your Moras: Orthography-Aware Error Analysis of Neural Japanese Morphological Generation
Wen Zhang

TL;DR
This paper analyzes how Japanese morphological inflection models are affected by orthographic features of hiragana, revealing systematic errors linked to orthography that impact model generalization.
Contribution
It introduces an orthography-aware error taxonomy for Japanese morphological inflection and demonstrates the importance of orthographic considerations in model evaluation.
Findings
Models exhibit systematic errors related to orthographic properties of hiragana.
Gemination errors account for 75-80% of residual errors, especially in certain verb stems.
Error patterns are consistent across different architectures and seeds.
Abstract
We present an orthography-aware error analysis of Japanese past-tense morphological inflection, treating hiragana not merely as a transcriptional medium, but as a representational system encoding morphophonological distinctions that may influence model generalization. We evaluate two character-level sequence-to-sequence architectures on past-tense formation using datasets formatted according to the SIGMORPHON 2020 and 2023 shared task conventions. Despite high aggregate accuracy, models exhibit systematic, linguistically interpretable errors that cluster around specific orthographic properties of hiragana. We introduce a concise error taxonomy capturing seven primary failure modes and provide both quantitative and qualitative analyses. Gemination-related errors dominate residual failures, accounting for 75-80% of errors, particularly in verbs whose stems end in the vowel e and require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
