idT5: Indonesian Version of Multilingual T5 Transformer
Mukhlish Fuadi, Adhi Dharma Wibawa, Surya Sumpeno

TL;DR
This paper introduces idT5, a smaller, Indonesian-specific version of the multilingual T5 transformer, achieving comparable performance on NLP tasks while reducing size, memory usage, and inference time.
Contribution
The study adapts the mT5 model specifically for Indonesian, creating a smaller, more efficient transformer model that maintains high performance across multiple NLP tasks.
Findings
idT5 achieves 77.18% accuracy on sentiment analysis.
Model size is reduced by up to 58%.
Inference speed and memory usage are significantly improved.
Abstract
Indonesian language is spoken by almost 200 million people and is the 10th most spoken language in the world, but it is under-represented in NLP (Natural Language Processing) research. A sparsity of language resources has hampered previous work on Indonesian. The Transformer is a new architecture rapidly becoming dominant for NLP, surpassing alternatives like convolutional and recurrent neural networks. T5 (Text-to-Text Transfer Transformer) is a Transformer model that converts all text-based language problems to text-to-text format for English. The multilingual variant is mT5 (multilingual T5) which has shown promising results on many NLP tasks across languages. However, the size of this multilingual model is a drawback for its application in real production applications, which sometimes require only one language. In this study, the mT5 model was adapted for only one language,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEdcuational Technology Systems · Topic Modeling · Data Mining and Machine Learning Applications
MethodsGated Linear Unit · Attention Is All You Need · SentencePiece · Adafactor · Linear Layer · Residual Connection · Attention Dropout · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer
