A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
Huandong Wang, Wenjie Fu, Yingzhou Tang, Zhilong Chen, Yuxi Huang,, Jinghua Piao, Chen Gao, Fengli Xu, Tao Jiang, Yong Li

TL;DR
This survey reviews recent strategies across all stages of LLM development to mitigate risks like privacy leakage, hallucinations, and malicious use, aiming to promote responsible deployment.
Contribution
It offers a comprehensive, unified framework covering multiple dimensions of responsible LLMs, unlike previous surveys focusing on single aspects.
Findings
Advances in privacy protection techniques.
Methods for reducing hallucinations and toxicity.
Strategies for defending against jailbreak attacks.
Abstract
While large language models (LLMs) present significant potential for supporting numerous real-world applications and delivering positive social impacts, they still face significant challenges in terms of the inherent risk of privacy leakage, hallucinated outputs, and value misalignment, and can be maliciously used for generating toxic content and unethical purposes after been jailbroken. Therefore, in this survey, we present a comprehensive review of recent advancements aimed at mitigating these issues, organized across the four phases of LLM development and usage: data collecting and pre-training, fine-tuning and alignment, prompting and reasoning, and post-processing and auditing. We elaborate on the recent advances for enhancing the performance of LLMs in terms of privacy protection, hallucination reduction, value alignment, toxicity elimination, and jailbreak defenses. In contrast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Blockchain Technology Applications and Security · Cloud Data Security Solutions
MethodsFocus
