LeCov: Multi-level Testing Criteria for Large Language Models

Xuan Xie; Jiayang Song; Yuheng Huang; Da Song; Fuyuan Zhang; Felix; Juefei-Xu; Lei Ma

arXiv:2408.10474·cs.SE·August 21, 2024

LeCov: Multi-level Testing Criteria for Large Language Models

Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix, Juefei-Xu, Lei Ma

PDF

Open Access

TL;DR

LeCov introduces a comprehensive multi-level testing framework for large language models, focusing on internal components to improve trustworthiness assessment through systematic and formalized testing criteria.

Contribution

The paper proposes a novel set of nine testing criteria targeting LLM internal components, enabling more thorough and formalized testing for trustworthiness.

Findings

01

LeCov effectively identifies untrustworthy issues in LLMs.

02

The criteria improve test coverage and prioritization.

03

Experimental results show enhanced detection of defects.

Abstract

Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e.g., truthfulness and toxicity. Recent research has started developing testing methods for LLMs, aiming to uncover untrustworthy issues, i.e., defects, before deployment. However, systematic and formalized testing criteria are lacking, which hinders a comprehensive assessment of the extent and adequacy of testing exploration. To mitigate this threat, we propose a set of multi-level testing criteria, LeCov, for LLMs. The criteria consider three crucial LLM internal components, i.e., the attention mechanism, feed-forward neurons, and uncertainty, and contain nine types of testing criteria in total. We apply the criteria in two scenarios: test prioritization and coverage-guided testing. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training