Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

Chi Zhang; Changjia Zhu; Junjie Xiong; Xiaoran Xu; Lingyao Li; Yao Liu; Zhuo Lu

arXiv:2508.05775·cs.CL·August 14, 2025

Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation of LLM

Chi Zhang, Changjia Zhu, Junjie Xiong, Xiaoran Xu, Lingyao Li, Yao Liu, Zhuo Lu

PDF

TL;DR

This survey reviews recent research on harmful content generation by LLMs, analyzing risks, mitigation strategies, and safety challenges to guide future development of ethically aligned language models.

Contribution

It provides a unified taxonomy of LLM-related harms and defenses, and analyzes emerging jailbreak strategies and mitigation techniques.

Findings

01

Identifies limitations in current evaluation methodologies.

02

Highlights effectiveness of reinforcement learning with human feedback (RLHF).

03

Outlines future research directions for LLM safety.

Abstract

Large Language Models (LLMs) have revolutionized content creation across digital platforms, offering unprecedented capabilities in natural language generation and understanding. These models enable beneficial applications such as content generation, question and answering (Q&A), programming, and code reasoning. Meanwhile, they also pose serious risks by inadvertently or intentionally producing toxic, offensive, or biased content. This dual role of LLMs, both as powerful tools for solving real-world problems and as potential sources of harmful language, presents a pressing sociotechnical challenge. In this survey, we systematically review recent studies spanning unintentional toxicity, adversarial jailbreaking attacks, and content moderation techniques. We propose a unified taxonomy of LLM-related harms and defenses, analyze emerging multimodal and LLM-assisted jailbreak strategies, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.