ASTRAL: Automated Safety Testing of Large Language Models

Miriam Ugarte; Pablo Valle; Jos\'e Antonio Parejo; Sergio Segura and; Aitor Arrieta

arXiv:2501.17132·cs.SE·January 29, 2025

ASTRAL: Automated Safety Testing of Large Language Models

Miriam Ugarte, Pablo Valle, Jos\'e Antonio Parejo, Sergio Segura and, Aitor Arrieta

PDF

Open Access

TL;DR

This paper introduces ASTRAL, an automated tool for safety testing of Large Language Models that generates diverse, up-to-date unsafe test cases and uses LLMs as test oracles to improve safety evaluation accuracy.

Contribution

ASTRAL presents a novel black-box coverage criterion and an LLM-based approach leveraging RAG and web browsing for comprehensive safety testing of LLMs.

Findings

01

GPT-3.5 outperforms other LLMs as a test oracle.

02

Approach uncovers nearly twice as many unsafe behaviors.

03

Web-guided test input generation increases unsafe behavior detection.

Abstract

Large Language Models (LLMs) have recently gained attention due to their ability to understand and generate sophisticated human-like content. However, ensuring their safety is paramount as they might provide harmful and unsafe responses. Existing LLM testing frameworks address various safety-related concerns (e.g., drugs, terrorism, animal abuse) but often face challenges due to unbalanced and obsolete datasets. In this paper, we present ASTRAL, a tool that automates the generation and execution of test cases (i.e., prompts) for testing the safety of LLMs. First, we introduce a novel black-box coverage criterion to generate balanced and diverse unsafe test inputs across a diverse set of safety categories as well as linguistic writing characteristics (i.e., different style and persuasive writing techniques). Second, we propose an LLM-based approach that leverages Retrieval Augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Safety Systems Engineering in Autonomy · Software Testing and Debugging Techniques

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training