Optimal and Efficient Binary Questioning for Human-in-the-Loop   Annotation

Franco Marchesoni-Acland; Jean-Michel Morel; Josselin Kherroubi,; Gabriele Facciolo

arXiv:2307.01578·cs.LG·July 6, 2023

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi,, Gabriele Facciolo

PDF

Open Access

TL;DR

This paper addresses the problem of efficiently annotating binary datasets by developing practical questioning strategies that minimize the number of yes/no questions needed, leveraging coding theory and heuristics.

Contribution

It introduces a spectrum of solutions from optimal Huffman-based strategies to practical heuristics for binary dataset annotation with minimal questions.

Findings

01

Achieves 23-86% improvement in annotation efficiency

02

Proposes Huffman encoding for optimal questioning strategies

03

Validates methods on synthetic and real datasets

Abstract

Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Data Stream Mining Techniques

MethodsFocus