AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

Chen Chen; Xueluan Gong; Ziyao Liu; Weifeng Jiang; Si Qi Goh; and Kwok-Yan Lam

arXiv:2408.12935·cs.AI·May 14, 2026·6 cites

AI Safety Landscape for Large Language Models: Taxonomy, State-of-the-art, and Future Directions

Chen Chen, Xueluan Gong, Ziyao Liu, Weifeng Jiang, Si Qi Goh, and Kwok-Yan Lam

PDF

TL;DR

This paper presents a comprehensive framework and review of AI safety, focusing on large language models, highlighting challenges, mitigation strategies, and future research directions to ensure safe AI deployment.

Contribution

It introduces a novel architectural framework for AI safety, categorizing it into Trustworthy, Responsible, and Safe AI, and reviews state-of-the-art safety techniques for large language models.

Findings

01

Identifies key challenges in AI safety for LLMs

02

Reviews current mitigation approaches and methodologies

03

Proposes future directions for AI safety research

Abstract

AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety and national security. In this paper, we propose a novel architectural framework for understanding and analyzing AI Safety; defining its characteristics from three perspectives: Trustworthy AI, Responsible AI, and Safe AI. We provide an extensive review of current research and advancements in AI safety from these perspectives, highlighting their key challenges and mitigation approaches. Through examples from state-of-the-art technologies, particularly Large Language Models (LLMs), we present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.