Evaluation and Improvement of Fault Detection for Large Language Models

Qiang Hu; Jin Wen; Maxime Cordy; Yuheng Huang; Wei Ma; Xiaofei Xie,; Lei Ma

arXiv:2404.14419·cs.SE·November 6, 2024

Evaluation and Improvement of Fault Detection for Large Language Models

Qiang Hu, Jin Wen, Maxime Cordy, Yuheng Huang, Wei Ma, Xiaofei Xie,, Lei Ma

PDF

Open Access

TL;DR

This paper empirically evaluates fault detection methods for large language models across multiple tasks and proposes MuCS, a prompt mutation framework, to significantly improve fault detection coverage.

Contribution

First empirical study on fault detection effectiveness for LLMs and introduces MuCS, a mutation-based framework that enhances fault detection capabilities.

Findings

01

Simple methods like Margin perform well but have room for improvement.

02

MuCS significantly boosts fault detection coverage, up to 70.53%.

03

Existing methods can be substantially improved with prompt mutation techniques.

Abstract

Large language models (LLMs) have recently achieved significant success across various application domains, garnering substantial attention from different communities. Unfortunately, even for the best LLM, many \textit{faults} still exist that LLM cannot properly predict. Such faults will harm the usability of LLMs in general and could introduce safety issues in reliability-critical systems such as autonomous driving systems. How to quickly reveal these faults in real-world datasets that LLM could face is important, but challenging. The major reason is that the ground truth is necessary but the data labeling process is heavy considering the time and human effort. To handle this problem, in the conventional deep learning testing field, test selection methods have been proposed for efficiently evaluating deep learning models by prioritizing faults. However, despite their importance, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies

MethodsLLaMA