Machine Learning Methods for Small Data and Upstream Bioprocessing Applications: A Comprehensive Review
Johnny Peng, Thanh Tung Khuat, Katarzyna Musial, Bogdan Gabrys

TL;DR
This review examines machine learning techniques tailored for small datasets in upstream bioprocessing, providing a taxonomy, analyzing methods, and offering practical insights for data-constrained biopharmaceutical applications.
Contribution
It introduces a taxonomy of ML methods for small data, thoroughly analyzes each method's core concepts, and evaluates their effectiveness in bioprocessing contexts.
Findings
Taxonomy of ML methods for small data in bioprocessing
Analysis of method effectiveness in real-world applications
Identification of research gaps and practical guidance
Abstract
Data is crucial for machine learning (ML) applications, yet acquiring large datasets can be costly and time-consuming, especially in complex, resource-intensive fields like biopharmaceuticals. A key process in this industry is upstream bioprocessing, where living cells are cultivated and optimised to produce therapeutic proteins and biologics. The intricate nature of these processes, combined with high resource demands, often limits data collection, resulting in smaller datasets. This comprehensive review explores ML methods designed to address the challenges posed by small data and classifies them into a taxonomy to guide practical applications. Furthermore, each method in the taxonomy was thoroughly analysed, with a detailed discussion of its core concepts and an evaluation of its effectiveness in tackling small data challenges, as demonstrated by application results in the upstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsViral Infectious Diseases and Gene Expression in Insects · vaccines and immunoinformatics approaches · Transgenic Plants and Applications
