Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
Zixuan Cang, Lin Mu, Guowei Wei

TL;DR
This paper develops algebraic topology methods to better represent biomolecules for machine learning tasks, improving accuracy in predicting binding affinities and virtual screening.
Contribution
It introduces novel topological approaches like multicomponent, multi-level, and electrostatic persistence, enhancing molecular representation for machine learning applications.
Findings
Outperforms existing methods in binding affinity prediction.
Achieves higher accuracy in ligand-decoy discrimination.
Effective in large-scale protein-ligand datasets.
Abstract
This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
