TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations

Xiuyuan Chen; Jian Zhao; Yuxiang He; Yuan Xun; Xinwei Liu; Yanshu Li; Huilin Zhou; Wei Cai; Ziyan Shi; Yuchen Yuan; Tianle Zhang; Chi Zhang; Xuelong Li

arXiv:2512.05485·cs.CR·December 9, 2025

TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations

Xiuyuan Chen, Jian Zhao, Yuxiang He, Yuan Xun, Xinwei Liu, Yanshu Li, Huilin Zhou, Wei Cai, Ziyan Shi, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li

PDF

Open Access

TL;DR

TeleAI-Safety introduces a comprehensive, modular benchmark framework for systematically evaluating large language model safety against diverse attacks, defenses, and evaluation methods, addressing current limitations in consistency and reproducibility.

Contribution

It provides a unified, reproducible framework with extensive attack, defense, and evaluation methods, enabling rigorous and flexible LLM safety assessments.

Findings

01

Identified systematic vulnerabilities in LLMs

02

Revealed trade-offs between safety and utility

03

Highlighted defense patterns for future improvements

Abstract

While the deployment of large language models (LLMs) in high-value industries continues to expand, the systematic assessment of their safety against jailbreak and prompt-based attacks remains insufficient. Existing safety evaluation benchmarks and frameworks are often limited by an imbalanced integration of core components (attack, defense, and evaluation methods) and an isolation between flexible evaluation frameworks and standardized benchmarking capabilities. These limitations hinder reliable cross-study comparisons and create unnecessary overhead for comprehensive risk assessment. To address these gaps, we present TeleAI-Safety, a modular and reproducible framework coupled with a systematic benchmark for rigorous LLM safety evaluation. Our framework integrates a broad collection of 19 attack methods (including one self-developed method), 29 defense methods, and 19 evaluation methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Smart Grid Security and Resilience