BioKlustering: a web app for semi-supervised learning of maximally imbalanced genomic data
Samuel Ozminkowski, Yuke Wu, Hailey Bruzzone, Liule Yang, Zhiwen Xu,, Luke Selberg, Chunrong Huang, Helena Jaramillo-Mesa, Claudia Solis-Lemus

TL;DR
BioKlustering is an accessible web application designed for semi-supervised learning on highly imbalanced and unaligned genomic data, enabling phenotype prediction even with minimal or partial labels and small sample sizes.
Contribution
It introduces a novel web tool that handles maximally imbalanced and unaligned genomic data for semi-supervised learning, expanding applicability in biological research.
Findings
Supports maximally imbalanced label settings including single-class observations
Handles unaligned sequences for diverse genomic data
Effective with small sample sizes
Abstract
Summary: Accurate phenotype prediction from genomic sequences is a highly coveted task in biological and medical research. While machine-learning holds the key to accurate prediction in a variety of fields, the complexity of biological data can render many methodologies inapplicable. We introduce BioKlustering, a user-friendly open-source and publicly available web app for unsupervised and semi-supervised learning specialized for cases when sequence alignment and/or experimental phenotyping of all classes are not possible. Among its main advantages, BioKlustering 1) allows for maximally imbalanced settings of partially observed labels including cases when only one class is observed, which is currently prohibited in most semi-supervised methods, 2) takes unaligned sequences as input and thus, allows learning for widely diverse sequences (impossible to align) such as virus and bacteria,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Gene expression and cancer classification
