# Bounded Expectation of Label Assignment: Dataset Annotation by   Supervised Splitting with Bias-Reduction Techniques

**Authors:** Alyssa Herbst, Bert Huang

arXiv: 1906.07046 · 2021-02-26

## TL;DR

This paper presents BELA, a cost-effective scheme for dataset annotation that uses supervised splitting and bias-reduction to infer labels with high accuracy, reducing the need for exhaustive manual labeling.

## Contribution

BELA introduces a supervised, bias-aware partitioning method for dataset annotation that improves label inference efficiency over existing hierarchical strategies.

## Key findings

- Outperforms existing adaptive labeling strategies on three datasets.
- Uses supervised models to improve data partitioning accuracy.
- Reduces annotation costs significantly while maintaining high label accuracy.

## Abstract

Annotating large unlabeled datasets can be a major bottleneck for machine learning applications. We introduce a scheme for inferring labels of unlabeled data at a fraction of the cost of labeling the entire dataset. Our scheme, bounded expectation of label assignment (BELA), greedily queries an oracle (or human labeler) and partitions a dataset to find data subsets that have mostly the same label. BELA can then infer labels by majority vote of the known labels in each subset. BELA determines whether to split or label from a subset by maximizing a lower bound on the expected number of correctly labeled examples. Our approach differs from existing hierarchical labeling schemes by using supervised models to partition the data, therefore avoiding reliance on unsupervised clustering methods that may not accurately group data by label. We design BELA with strategies to avoid bias that could be introduced through this adaptive partitioning. We evaluate BELA on three datasets and find that it outperforms existing strategies for adaptive labeling.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07046/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1906.07046/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1906.07046/full.md

---
Source: https://tomesphere.com/paper/1906.07046