REAL: A Representative Error-Driven Approach for Active Learning

Cheng Chen; Yong Wang; Lizi Liao; Yueguo Chen; Xiaoyong Du

arXiv:2307.00968·cs.LG·July 7, 2023

REAL: A Representative Error-Driven Approach for Active Learning

Cheng Chen, Yong Wang, Lizi Liao, Yueguo Chen, Xiaoyong Du

PDF

Open Access 1 Repo

TL;DR

REAL introduces a novel active learning method that focuses on selecting representative pseudo errors based on error density, leading to improved model accuracy and F1 scores in text classification tasks.

Contribution

It proposes a new error-driven sampling approach that considers neighborhood error density, outperforming existing methods in active learning for text classification.

Findings

01

Outperforms baselines in accuracy and F1-macro scores

02

Selects representative pseudo errors matching ground-truth error distribution

03

Effective across various hyperparameter settings

Abstract

Given a limited labeling budget, active learning (AL) aims to sample the most informative instances from an unlabeled pool to acquire labels for subsequent model training. To achieve this, AL typically measures the informativeness of unlabeled instances based on uncertainty and diversity. However, it does not consider erroneous instances with their neighborhood error density, which have great potential to improve the model performance. To address this limitation, we propose $R E A L$ , a novel approach to select data instances with $\underline{R}$ epresentative $\underline{E}$ rrors for $\underline{A}$ ctive $\underline{L}$ earning. It identifies minority predictions as \emph{pseudo errors} within a cluster and allocates an adaptive sampling budget for the cluster based on estimated error density. Extensive experiments on five text classification datasets demonstrate that $R E A L$ consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

withchencheng/ecml_pkdd_23_real
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Topic Modeling