SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia
Zhixiang Lu, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Imran Razzak, Jionglong Su, Zhengyong Jiang

TL;DR
SAGE introduces an energy-efficient, culturally attuned translation framework for low-resource Southeast Asian languages, using reinforcement learning to curate minimal yet effective training data, achieving state-of-the-art results with significantly reduced environmental impact.
Contribution
This work pioneers an energy-aware data curation method using reinforcement learning to improve low-resource language translation while minimizing environmental costs.
Findings
Achieves state-of-the-art BLEU-4 and COMET-22 scores.
Reduces data usage by 97.1% and energy consumption by 95.2%.
Effectively captures local linguistic nuances.
Abstract
The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large language models (LLMs) offer a potential solution for translation, their deployment in data-poor contexts faces a dual challenge: the scarcity of high-quality, culturally relevant data and the prohibitive energy costs of training on massive, noisy web corpora. To resolve the tension between digital inclusion and environmental sustainability, we introduce Sustainable Agent-Guided Expert-tuning (SAGE). This framework pioneers an energy-aware paradigm that prioritizes the "right data" over "big data". Instead of carbon-intensive training on unfiltered datasets, SAGE employs a reinforcement learning (RL) agent, optimized via Group Relative Policy Optimization (GRPO), to autonomously curate a compact training set. The agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Computational and Text Analysis Methods · Topic Modeling
