Domain adaptation in small-scale and heterogeneous biological datasets
Seyedmehdi Orouji, Martin C. Liu, Tal Korem, Megan A. K. Peters

TL;DR
This paper reviews domain adaptation techniques tailored for small, heterogeneous biological datasets, highlighting their potential to improve model generalization across diverse biological studies.
Contribution
It provides a synthetic overview of domain adaptation methods specific to small-scale, complex biological data, emphasizing challenges and future directions.
Findings
Domain adaptation can improve model transferability in biology.
Most methods are designed for large-scale data, not small, heterogeneous datasets.
Customized domain adaptation approaches are needed for biological applications.
Abstract
Machine learning techniques are steadily becoming more important in modern biology, and are used to build predictive models, discover patterns, and investigate biological problems. However, models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories, due to differences in the statistical properties of these datasets. These could stem from technical differences, such as the measurement technique used, or from relevant biological differences between the populations studied. Domain adaptation, a type of transfer learning, can alleviate this problem by aligning the statistical distributions of features and samples among different datasets so that similar models can be applied across them. However, a majority of state-of-the-art domain adaptation methods are designed to work with large-scale data, mostly text and images, while biological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics
