Active Slice Discovery in Large Language Models
Minhui Zhang, Prahar Ijner, Yoav Wald, Elliot Creager

TL;DR
This paper introduces Active Slice Discovery, an approach to efficiently identify specific error subsets in large language models by actively grouping errors, significantly reducing annotation effort and improving understanding of model failures.
Contribution
The paper formalizes Active Slice Discovery and empirically demonstrates its effectiveness in toxicity classification, highlighting the success of uncertainty-based active learning algorithms.
Findings
Active slice discovery reduces annotation effort to 2-10% of data.
Uncertainty-based active learning algorithms outperform baselines.
Method effectively identifies error slices in toxicity classification.
Abstract
Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Authorship Attribution and Profiling · Sentiment Analysis and Opinion Mining
