Adapting Large Language Models for Content Moderation: Pitfalls in Data   Engineering and Supervised Fine-tuning

Huan Ma; Changqing Zhang; Huazhu Fu; Peilin Zhao; Bingzhe Wu

arXiv:2310.03400·cs.LG·March 8, 2024·6 cites

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning

Huan Ma, Changqing Zhang, Huazhu Fu, Peilin Zhao, Bingzhe Wu

PDF

Open Access

TL;DR

This paper explores fine-tuning large language models for content moderation, highlighting data engineering challenges, the benefits of reasoning processes, and methods to prevent overfitting in private deployments.

Contribution

It introduces a comprehensive process for fine-tuning LLMs for content moderation, emphasizing reasoning integration to reduce overfitting without requiring reasoning outputs during deployment.

Findings

01

Reasoning during fine-tuning alleviates overfitting

02

Fine-tuning LLMs improves content moderation accuracy

03

Complete pipeline from data collection to model training provided

Abstract

Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. A common approach is to use a discriminative model to classify the content, but this method often requires strict data engineering, otherwise it will face unacceptable overfitting. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. Thanks to the knowledge of the foundation models, we can develop more robust privately deployed models with limited data via fine-tuning these foundation models. Moreover, as a generative model, it can provide detailed analysis of the review process, enhancing interpretability. In this paper, we introduce how to fine-tune a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection