Composing RNNs and FSTs for Small Data: Recovering Missing Characters in Old Hawaiian Text
Oiwi Parker Jones, Brendan Shillingford

TL;DR
This paper presents a hybrid method combining finite state transducers and RNNs to automatically recover missing characters in old Hawaiian texts, improving transliteration accuracy with limited data.
Contribution
It introduces a novel hybrid approach that composes FSTs with RNNs, outperforming purely FST-based methods for small data transliteration tasks.
Findings
Hybrid approach outperforms end-to-end FST in accuracy.
Partitioning the problem improves model performance.
Method effectively handles low-resource transliteration.
Abstract
In contrast to the older writing system of the 19th century, modern Hawaiian orthography employs characters for long vowels and glottal stops. These extra characters account for about one-third of the phonemes in Hawaiian, so including them makes a big difference to reading comprehension and pronunciation. However, transliterating between older and newer texts is a laborious task when performed manually. We introduce two related methods to help solve this transliteration problem automatically, given that there were not enough data to train an end-to-end deep learning model. One method is implemented, end-to-end, using finite state transducers (FSTs). The other is a hybrid deep learning approach which approximately composes an FST with a recurrent neural network (RNN). We find that the hybrid approach outperforms the end-to-end FST by partitioning the original problem into one part that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
