When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction

Simin Yu; Sufia Fathima

arXiv:2604.19335·cs.LG·April 22, 2026

When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction

Simin Yu, Sufia Fathima

PDF

TL;DR

This study systematically evaluates active learning strategies for chemical reaction extraction, revealing challenges and insights for improving data efficiency in chemical information tasks.

Contribution

It introduces a comprehensive analysis of active learning methods integrated with transformer-CRF models for chemical reaction extraction, highlighting task-dependent behaviors and limitations.

Findings

01

Some methods approach full-data performance with fewer labels

02

Learning curves are often non-monotonic and task-dependent

03

Pretraining, CRF decoding, and label sparsity affect active learning stability

Abstract

The rapid growth of chemical literature has generated vast amounts of unstructured data, where reaction information is particularly valuable for applications such as reaction predictions and drug design. However, the prohibitive cost of expert annotation has led to a scarcity of training data, severely hindering the performance of automatic reaction extraction. In this work, we conduct a systematic study of active learning for chemical reaction extraction. We integrate six uncertainty- and diversity-based strategies with pretrained transformer-CRF architectures, and evaluate them on product extraction and role labeling task. While several methods approach full-data performance with fewer labeled instances, learning curves are often non-monotonic and task-dependent. Our analysis shows that strong pretraining, structured CRF decoding, and label sparsity limit the stability of conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.