MILE: A Mutation Testing Framework of In-Context Learning Systems
Zeming Wei, Yihao Zhang, Meng Sun

TL;DR
This paper introduces MILE, a mutation testing framework tailored for in-context learning systems in large language models, to evaluate the quality and reliability of demonstration data.
Contribution
It presents novel mutation operators and scores specifically designed for ICL demonstrations, enabling systematic assessment of test data effectiveness.
Findings
Effective evaluation of ICL test suites demonstrated
Mutation scores correlate with demonstration quality
Framework improves understanding of ICL robustness
Abstract
In-context Learning (ICL) has achieved notable success in the applications of large language models (LLMs). By adding only a few input-output pairs that demonstrate a new task, the LLM can efficiently learn the task during inference without modifying the model parameters. Such mysterious ability of LLMs has attracted great research interests in understanding, formatting, and improving the in-context demonstrations, while still suffering from drawbacks like black-box mechanisms and sensitivity against the selection of examples. In this work, inspired by the foundations of adopting testing techniques in machine learning (ML) systems, we propose a mutation testing framework designed to characterize the quality and effectiveness of test data for ICL systems. First, we propose several mutation operators specialized for ICL demonstrations, as well as corresponding mutation scores for ICL test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming
