An Active Learning Framework for Data-Efficient, Human-in-the-Loop Enzyme Function Prediction
Ashley Babjac, Adrienne Hoarfrost

TL;DR
This paper presents HATTER, an active learning framework that efficiently improves enzyme function prediction by integrating human-in-the-loop annotation, reducing data and computational requirements while maintaining high performance.
Contribution
Introduction of HATTER, a modular active learning framework that combines multiple strategies with human input for scalable, data-efficient enzyme function prediction.
Findings
Active learning matches standard training performance with less data.
Point-based uncertainty sampling methods perform as well or better than complex methods.
Human-in-the-loop active learning accelerates enzyme discovery efficiently.
Abstract
Generalizable protein function prediction is increasingly constrained by the growing mismatch between exponentially expanding sequences of environmental proteins and the comparatively slow accumulation of experimentally verified functional data. Active learning offers a promising path forward for accelerating biological function prediction, by selecting the most informative proteins to experimentally annotate for data-efficient training, yet its potential remains largely unexplored. We introduce HATTER (Human-in-the-loop Adaptive Toolkit for Transferable Enzyme Representations), a modular framework that integrates multiple active learning strategies with human-in-the-loop experimental annotation to efficiently fine tune function prediction models. We compare active learning training to standard supervised training for biological enzyme function prediction, demonstrating that active…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Gene Regulatory Network Analysis · Microbial Metabolic Engineering and Bioproduction
