A Novel Two-Step Fine-Tuning Pipeline for Cold-Start Active Learning in Text Classification Tasks
Fabiano Bel\'em, Washington Cunha, Celso Fran\c{c}a, Claudio Andrade,, Leonardo Rocha, Marcos Andr\'e Gon\c{c}alves

TL;DR
This paper introduces DoTCAL, a two-step fine-tuning pipeline leveraging unlabeled data and active learning to improve BERT-based text classification in cold-start scenarios, showing significant performance gains.
Contribution
The paper proposes a novel two-step fine-tuning pipeline, DoTCAL, that reduces labeled data reliance and enhances active learning effectiveness for BERT in cold-start text classification.
Findings
DoTCAL outperforms traditional methods with up to 33% higher Macro-F1.
BOW and LSI sometimes outperform BERT, especially in low-resource tasks.
Using unlabeled data via domain adaptation improves model performance.
Abstract
This is the first work to investigate the effectiveness of BERT-based contextual embeddings in active learning (AL) tasks on cold-start scenarios, where traditional fine-tuning is infeasible due to the absence of labeled data. Our primary contribution is the proposal of a more robust fine-tuning pipeline - DoTCAL - that diminishes the reliance on labeled data in AL using two steps: (1) fully leveraging unlabeled data through domain adaptation of the embeddings via masked language modeling and (2) further adjusting model weights using labeled data selected by AL. Our evaluation contrasts BERT-based embeddings with other prevalent text representation paradigms, including Bag of Words (BoW), Latent Semantic Indexing (LSI), and FastText, at two critical stages of the AL process: instance selection and classification. Experiments conducted on eight ATC benchmarks with varying AL budgets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections · Multi-Head Attention · Residual Connection · Dropout · WordPiece
