A Deep Investigation of RNN and Self-attention for the Cyrillic-Traditional Mongolian Bidirectional Conversion
Muhan Na, Rui Liu, Feilong, Guanglai Gao

TL;DR
This paper compares RNN and Transformer models for bidirectional conversion between Cyrillic and Traditional Mongolian, demonstrating that both outperform traditional models, with Transformers achieving the best results.
Contribution
It is the first comprehensive study applying RNN and Transformer models to Mongolian script conversion, showing their effectiveness over traditional methods.
Findings
Transformers outperform RNNs and traditional models.
Transformer reduces WER by over 5% in both conversion directions.
Deep comparison of network configurations enhances understanding of model performance.
Abstract
Cyrillic and Traditional Mongolian are the two main members of the Mongolian writing system. The Cyrillic-Traditional Mongolian Bidirectional Conversion (CTMBC) task includes two conversion processes, including Cyrillic Mongolian to Traditional Mongolian (C2T) and Traditional Mongolian to Cyrillic Mongolian conversions (T2C). Previous researchers adopted the traditional joint sequence model, since the CTMBC task is a natural Sequence-to-Sequence (Seq2Seq) modeling problem. Recent studies have shown that Recurrent Neural Network (RNN) and Self-attention (or Transformer) based encoder-decoder models have shown significant improvement in machine translation tasks between some major languages, such as Mandarin, English, French, etc. However, an open problem remains as to whether the CTMBC quality can be improved by utilizing the RNN and Transformer models. To answer this question, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Softmax · Dropout · Adam · Dense Connections · Residual Connection
