Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation
Haryo Akbarianto Wibowo, Tatag Aziz Prawiro, Muhammad Ihsan, Alham, Fikri Aji, Radityo Eko Prasojo, Rahmad Mahendra, Suci Fitriany

TL;DR
This paper tackles the challenge of converting informal Indonesian language to formal style using low-resource machine translation techniques, introducing a new dataset and comparing different models including phrase-based MT and GPT-2.
Contribution
The study presents a new parallel dataset for informal-formal Indonesian and evaluates multiple style transfer strategies in a low-resource setting, highlighting the effectiveness of phrase-based MT and GPT-2.
Findings
Phrase-based machine translation outperforms Transformer-based models in low-resource settings.
Fine-tuned GPT-2 achieves comparable results to phrase-based MT but requires more computational resources.
Artificial forward-translated data improves style transfer performance.
Abstract
In its daily use, the Indonesian language is riddled with informality, that is, deviations from the standard in terms of vocabulary, spelling, and word order. On the other hand, current available Indonesian NLP models are typically developed with the standard Indonesian in mind. In this work, we address a style-transfer from informal to formal Indonesian as a low-resource machine translation problem. We build a new dataset of parallel sentences of informal Indonesian and its formal counterpart. We benchmark several strategies to perform style transfer from informal to formal Indonesian. We also explore augmenting the training set with artificial forward-translated data. Since we are dealing with an extremely low-resource setting, we find that a phrase-based machine translation approach outperforms the Transformer-based approach. Alternatively, a pre-trained GPT-2 fined-tuned to this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Cosine Annealing · Attention Dropout · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Weight Decay · Discriminative Fine-Tuning · Attention Is All You Need · Linear Warmup With Cosine Annealing
