LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators

Leanne Tan; Gabriel Chua; Ziyu Ge; Roy Ka-Wei Lee

arXiv:2507.15339·cs.CL·September 30, 2025

LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators

Leanne Tan, Gabriel Chua, Ziyu Ge, Roy Ka-Wei Lee

PDF

Open Access 1 Models

TL;DR

LionGuard 2 is a lightweight, multilingual content moderation system tailored for Singapore, leveraging pre-trained embeddings and local data to outperform larger models without fine-tuning, and is actively deployed in government.

Contribution

The paper introduces LionGuard 2, a novel lightweight multilingual moderation classifier that achieves high performance using pre-trained embeddings and local data, without fine-tuning large models.

Findings

01

Outperforms several commercial and open-source moderation systems.

02

Effective across multiple languages including English, Chinese, Malay, and Tamil.

03

Demonstrates practical deployment within the Singapore Government.

Abstract

Modern moderation systems increasingly support multiple languages, but often fail to address localisation and low-resource variants - creating safety gaps in real-world deployments. Small models offer a potential alternative to large LLMs, yet still demand considerable data and compute. We present LionGuard 2, a lightweight, multilingual moderation classifier tailored to the Singapore context, supporting English, Chinese, Malay, and partial Tamil. Built on pre-trained OpenAI embeddings and a multi-head ordinal classifier, LionGuard 2 outperforms several commercial and open-source systems across 17 benchmarks, including both Singapore-specific and public English datasets. The system is actively deployed within the Singapore Government, demonstrating practical efficacy at scale. Our findings show that high-quality local data and robust multilingual embeddings can achieve strong moderation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
govtech/lionguard-2
model· 245 dl· ♡ 1
245 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Spam and Phishing Detection