TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations
Xiuyuan Chen, Jian Zhao, Yuxiang He, Yuan Xun, Xinwei Liu, Yanshu Li, Huilin Zhou, Wei Cai, Ziyan Shi, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li

TL;DR
TeleAI-Safety introduces a comprehensive, modular benchmark framework for systematically evaluating large language model safety against diverse attacks, defenses, and evaluation methods, addressing current limitations in consistency and reproducibility.
Contribution
It provides a unified, reproducible framework with extensive attack, defense, and evaluation methods, enabling rigorous and flexible LLM safety assessments.
Findings
Identified systematic vulnerabilities in LLMs
Revealed trade-offs between safety and utility
Highlighted defense patterns for future improvements
Abstract
While the deployment of large language models (LLMs) in high-value industries continues to expand, the systematic assessment of their safety against jailbreak and prompt-based attacks remains insufficient. Existing safety evaluation benchmarks and frameworks are often limited by an imbalanced integration of core components (attack, defense, and evaluation methods) and an isolation between flexible evaluation frameworks and standardized benchmarking capabilities. These limitations hinder reliable cross-study comparisons and create unnecessary overhead for comprehensive risk assessment. To address these gaps, we present TeleAI-Safety, a modular and reproducible framework coupled with a systematic benchmark for rigorous LLM safety evaluation. Our framework integrates a broad collection of 19 attack methods (including one self-developed method), 29 defense methods, and 19 evaluation methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Smart Grid Security and Resilience
