STELLAR: A Search-Based Testing Framework for Large Language Model Applications
Lev Sorokin, Ivan Vasilev, Ken E. Friedl, Andrea Stocco

TL;DR
STELLAR is an automated search-based testing framework that systematically uncovers unsafe or incorrect responses in large language model applications by modeling test generation as an optimization problem and using evolutionary algorithms.
Contribution
It introduces a novel evolutionary optimization approach to systematically explore input features and expose failures in LLM-based systems, outperforming existing methods.
Findings
Exposes up to 4.3 times more failures than baseline methods.
Effectively identifies unsafe and incorrect responses in various LLM applications.
Demonstrates applicability across different domains and system types.
Abstract
Large Language Model (LLM)-based applications are increasingly deployed across various domains, including customer service, education, and mobility. However, these systems are prone to inaccurate, fictitious, or harmful responses, and their vast, high-dimensional input space makes systematic testing particularly challenging. To address this, we present STELLAR, an automated search-based testing framework for LLM-based applications that systematically uncovers text inputs leading to inappropriate system responses. Our framework models test generation as an optimization problem and discretizes the input space into stylistic, content-related, and perturbation features. Unlike prior work that focuses on prompt optimization or coverage heuristics, our work employs evolutionary optimization to dynamically explore feature combinations that are more likely to expose failures. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
