MILE: A Mutation Testing Framework of In-Context Learning Systems

Zeming Wei; Yihao Zhang; Meng Sun

arXiv:2409.04831·cs.SE·September 10, 2024

MILE: A Mutation Testing Framework of In-Context Learning Systems

Zeming Wei, Yihao Zhang, Meng Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces MILE, a mutation testing framework tailored for in-context learning systems in large language models, to evaluate the quality and reliability of demonstration data.

Contribution

It presents novel mutation operators and scores specifically designed for ICL demonstrations, enabling systematic assessment of test data effectiveness.

Findings

01

Effective evaluation of ICL test suites demonstrated

02

Mutation scores correlate with demonstration quality

03

Framework improves understanding of ICL robustness

Abstract

In-context Learning (ICL) has achieved notable success in the applications of large language models (LLMs). By adding only a few input-output pairs that demonstrate a new task, the LLM can efficiently learn the task during inference without modifying the model parameters. Such mysterious ability of LLMs has attracted great research interests in understanding, formatting, and improving the in-context demonstrations, while still suffering from drawbacks like black-box mechanisms and sensitivity against the selection of examples. In this work, inspired by the foundations of adopting testing techniques in machine learning (ML) systems, we propose a mutation testing framework designed to characterize the quality and effectiveness of test data for ICL systems. First, we propose several mutation operators specialized for ICL demonstrations, as well as corresponding mutation scores for ICL test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weizeming/mile
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming