AVISE: Framework for Evaluating the Security of AI Systems
Mikko Lempinen, Joni Kemppainen, Niklas Raesalmi

TL;DR
AVISE is an open-source framework designed to systematically evaluate and identify security vulnerabilities in AI systems, demonstrated through a new attack method and a security evaluation test for language models.
Contribution
Introduces AVISE, a modular framework for AI security assessment, including a novel attack extension and an automated jailbreak vulnerability testing tool.
Findings
All evaluated language models are vulnerable to the augmented Red Queen attack.
The Security Evaluation Test (SET) achieved 92% accuracy and 0.91 F1-score.
AVISE enables reproducible and extensible AI security evaluations.
Abstract
As artificial intelligence (AI) systems are increasingly deployed across critical domains, their security vulnerabilities pose growing risks of high-profile exploits and consequential system failures. Yet systematic approaches to evaluating AI security remain underdeveloped. In this paper, we introduce AVISE (AI Vulnerability Identification and Security Evaluation), a modular open-source framework for identifying vulnerabilities in and evaluating the security of AI systems and models. As a demonstration of the framework, we extend the theory-of-mind-based multi-turn Red Queen attack into an Adversarial Language Model (ALM) augmented attack and develop an automated Security Evaluation Test (SET) for discovering jailbreak vulnerabilities in language models. The SET comprises 25 test cases and an Evaluation Language Model (ELM) that determines whether each test case was able to jailbreak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
