Assessing the Robustness of LLM-based NLP Software via Automated Testing
Mingxuan Xiao, Yan Xiao, Shunhui Ji, Hanbo Cai, Lei Xue, Pengcheng, Zhang

TL;DR
This paper introduces AORTA, an automated framework for testing the robustness of LLM-based NLP software, featuring a novel Adaptive Beam Search method that improves testing efficiency and effectiveness.
Contribution
The paper presents the first automated robustness testing framework for LLM-based NLP software and a new Adaptive Beam Search method tailored for large language models.
Findings
ABS achieves an average test success rate of 86.138%.
ABS significantly reduces computational overhead compared to baseline methods.
Test cases generated by ABS are more natural and transferable.
Abstract
Benefiting from the advancements in LLMs, NLP software has undergone rapid development. Such software is widely employed in various safety-critical tasks, such as financial sentiment analysis, toxic content moderation, and log generation. Unlike traditional software, LLM-based NLP software relies on prompts and examples as inputs. Given the complexity of LLMs and the unpredictability of real-world inputs, quantitatively assessing the robustness of such software is crucial. However, to the best of our knowledge, no automated robustness testing methods have been specifically designed to evaluate the overall inputs of LLM-based NLP software. To this end, this paper introduces the first AutOmated Robustness Testing frAmework, AORTA, which reconceptualizes the testing process into a combinatorial optimization problem. Existing testing methods designed for DNN-based software can be applied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Reliability and Analysis Research
