LionGuard: Building a Contextualized Moderation Classifier to Tackle   Localized Unsafe Content

Jessica Foo; Shaun Khoo

arXiv:2407.10995·cs.CL·July 22, 2024

LionGuard: Building a Contextualized Moderation Classifier to Tackle Localized Unsafe Content

Jessica Foo, Shaun Khoo

PDF

Open Access 1 Models

TL;DR

LionGuard is a Singapore-specific moderation classifier that improves safety detection for local languages like Singlish, outperforming generic APIs and emphasizing the importance of localization in moderation tools.

Contribution

We introduce LionGuard, a localized moderation classifier tailored for Singaporean context, demonstrating significant performance gains over existing non-localized moderation APIs.

Findings

01

LionGuard outperforms existing APIs by 14-51% on Singlish data.

02

Localization enhances moderation accuracy for low-resource languages.

03

The approach is practical and scalable for diverse language contexts.

Abstract

As large language models (LLMs) become increasingly prevalent in a wide variety of applications, concerns about the safety of their outputs have become more significant. Most efforts at safety-tuning or moderation today take on a predominantly Western-centric view of safety, especially for toxic, hateful, or violent speech. In this paper, we describe LionGuard, a Singapore-contextualized moderation classifier that can serve as guardrails against unsafe LLM outputs. When assessed on Singlish data, LionGuard outperforms existing widely-used moderation APIs, which are not finetuned for the Singapore context, by 14% (binary) and up to 51% (multi-label). Our work highlights the benefits of localization for moderation classifiers and presents a practical and scalable approach for low-resource languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
govtech/lionguard-v1
model· 11 dl· ♡ 10
11 dl♡ 10

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection