Donkii: Can Annotation Error Detection Methods Find Errors in   Instruction-Tuning Datasets?

Leon Weber-Genzel; Robert Litschko; Ekaterina Artemova and; Barbara Plank

arXiv:2309.01669·cs.CL·February 23, 2024

Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova and, Barbara Plank

PDF

Open Access 1 Repo

TL;DR

This paper introduces DONKII, a novel benchmark for annotation error detection in instruction-tuning datasets for language models, demonstrating the importance of AED methods in improving data quality and model performance.

Contribution

It presents the first benchmark dataset for AED in instruction tuning, along with a taxonomy of error types and evaluation of AED methods for language generation tasks.

Findings

01

All datasets contain clear errors that can propagate into models

02

Model size and AED method choice significantly affect error detection performance

03

Practical recommendations for using AED to improve instruction-tuning data

Abstract

Instruction tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality problems in gold standard labels. So far, however, the application of AED methods has been limited to classification tasks. It is an open question how well AED methods generalize to language generation settings, which are becoming more widespread via LLMs. In this paper, we present a first and novel benchmark for AED on instruction tuning data: DONKII. It comprises three instruction-tuning datasets enriched with error annotations by experts and semi-automatic methods. We also provide a novel taxonomy of error types for instruction-tuning data. We find that all three datasets contain clear errors, which sometimes propagate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mainlp/donkii
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling