A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and   Mitigation Strategy

Huandong Wang; Wenjie Fu; Yingzhou Tang; Zhilong Chen; Yuxi Huang,; Jinghua Piao; Chen Gao; Fengli Xu; Tao Jiang; Yong Li

arXiv:2501.09431·cs.AI·January 17, 2025·3 cites

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang,, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li

PDF

Open Access

TL;DR

This survey reviews recent strategies across all stages of LLM development to mitigate risks like privacy leakage, hallucinations, and malicious use, aiming to promote responsible deployment.

Contribution

It offers a comprehensive, unified framework covering multiple dimensions of responsible LLMs, unlike previous surveys focusing on single aspects.

Findings

01

Advances in privacy protection techniques.

02

Methods for reducing hallucinations and toxicity.

03

Strategies for defending against jailbreak attacks.

Abstract

While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Blockchain Technology Applications and Security · Cloud Data Security Solutions

MethodsFocus