M3Site: multiclass multimodal learning for protein active site identification and classification
Song Ouyang, Yong Luo, Huiyu Cai, Kehua Su, Fei Liao, Na Zhan, Huangxuan Zhao, Tailang Yin, Lin Zhao, Dongjing Shan

TL;DR
M3Site is a new method that uses multiple data types to better identify and classify protein active sites, improving drug design and biology research.
Contribution
M3Site introduces a multiclass multimodal framework for protein active site prediction, combining sequence, structure, and text data.
Findings
M3Site outperforms existing models in identifying and classifying protein active sites.
The framework integrates sequence, structural, and functional data for residue-level predictions.
An interactive application enhances practical utility for predictions and visualizations.
Abstract
Accurately identifying and classifying protein active sites is crucial for understanding protein mechanisms, drug design, and synthetic biology. Current methods often rely on binary classification and single-modal data, limiting their scope. To address these limitations, we propose M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \end{document}Site, a multimodal framework that integrates protein sequence embeddings, structural graph representations, and functional text annotations for residue-level, multiclass active site prediction. Built upon a curated dataset of 25 883 proteins sourced from UniProt and AlphaFold2, M\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym}…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Biomedical Text Mining and Ontologies
