GBNL: Graded Betti Number Learning of Complex Biological Data
Mushal Zia, Faisal Suwayyid, Guo-Wei Wei

TL;DR
This paper introduces GBNL, a novel method applying graded Betti numbers from persistent commutative algebra to biological sequence data, improving protein-nucleic acid binding prediction by capturing multiscale topological features.
Contribution
It is the first to utilize graded Betti numbers in machine learning for biological data, integrating algebraic topology with transformer models for enhanced sequence analysis.
Findings
GBNL effectively detects single-site mutations.
It distinguishes complex mutation patterns.
Numerical studies show improved prediction accuracy.
Abstract
While persistent homology is widely used for data shape analysis, persistent commutative algebra (PCA) has seen limited adoption in machine learning and data science. Unlike persistent homology, which delivers topological invariants in the form of Betti numbers, PCA provides both algebraic invariants and graded Betti numbers. However, graded Betti numbers have seldom been applied to real-world data. In this work, we introduce the first-of-its-kind application of commutative algebra graded Betti numbers in machine learning and data science. Specifically, we present Graded Betti Number Learning (GBNL) for protein-nucleic acid binding prediction. Protein-DNA/RNA interactions are fundamental to cellular processes such as replication, transcription, translation, and gene regulation, and their understanding and prediction remain challenging. GBNL represents each nucleic acid sequence as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
