ASTRAL: Automated Safety Testing of Large Language Models
Miriam Ugarte, Pablo Valle, Jos\'e Antonio Parejo, Sergio Segura and, Aitor Arrieta

TL;DR
This paper introduces ASTRAL, an automated tool for safety testing of Large Language Models that generates diverse, up-to-date unsafe test cases and uses LLMs as test oracles to improve safety evaluation accuracy.
Contribution
ASTRAL presents a novel black-box coverage criterion and an LLM-based approach leveraging RAG and web browsing for comprehensive safety testing of LLMs.
Findings
GPT-3.5 outperforms other LLMs as a test oracle.
Approach uncovers nearly twice as many unsafe behaviors.
Web-guided test input generation increases unsafe behavior detection.
Abstract
Large Language Models (LLMs) have recently gained attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Safety Systems Engineering in Autonomy · Software Testing and Debugging Techniques
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
