idT5: Indonesian Version of Multilingual T5 Transformer

Mukhlish Fuadi; Adhi Dharma Wibawa; Surya Sumpeno

arXiv:2302.00856·cs.CL·October 28, 2024·5 cites

idT5: Indonesian Version of Multilingual T5 Transformer

Mukhlish Fuadi, Adhi Dharma Wibawa, Surya Sumpeno

PDF

Open Access 2 Models

TL;DR

This paper introduces idT5, a smaller, Indonesian-specific version of the multilingual T5 transformer, achieving comparable performance on NLP tasks while reducing size, memory usage, and inference time.

Contribution

The study adapts the mT5 model specifically for Indonesian, creating a smaller, more efficient transformer model that maintains high performance across multiple NLP tasks.

Findings

01

idT5 achieves 77.18% accuracy on sentiment analysis.

02

Model size is reduced by up to 58%.

03

Inference speed and memory usage are significantly improved.

Abstract

Indonesian language is spoken by almost 200 million people and is the 10th most spoken language in the world, but it is under-represented in NLP (Natural Language Processing) research. A sparsity of language resources has hampered previous work on Indonesian. The Transformer is a new architecture rapidly becoming dominant for NLP, surpassing alternatives like convolutional and recurrent neural networks. T5 (Text-to-Text Transfer Transformer) is a Transformer model that converts all text-based language problems to text-to-text format for English. The multilingual variant is mT5 (multilingual T5) which has shown promising results on many NLP tasks across languages. However, the size of this multilingual model is a drawback for its application in real production applications, which sometimes require only one language. In this study, the mT5 model was adapted for only one language,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEdcuational Technology Systems · Topic Modeling · Data Mining and Machine Learning Applications

MethodsGated Linear Unit · Attention Is All You Need · SentencePiece · Adafactor · Linear Layer · Residual Connection · Attention Dropout · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer