Accurate and efficient protein embedding using multi-teacher   distillation learning

Jiayu Shang; Cheng Peng; Yongxin Ji; Jiaojiao Guan; Dehan Cai; Xubo; Tang; Yanni Sun

arXiv:2405.11735·q-bio.GN·May 21, 2024·Bioinform.

Accurate and efficient protein embedding using multi-teacher distillation learning

Jiayu Shang, Cheng Peng, Yongxin Ji, Jiaojiao Guan, Dehan Cai, Xubo, Tang, Yanni Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-teacher distillation learning method for protein embedding that significantly reduces computational costs while maintaining high accuracy, facilitating large-scale protein analysis.

Contribution

The paper presents a novel distillation approach leveraging multiple pre-trained models to create efficient, compact protein embeddings with minimal performance loss.

Findings

01

Reduces computational time by approximately 70%.

02

Maintains nearly the same accuracy as larger models.

03

Enables efficient large-scale protein analysis.

Abstract

Motivation: Protein embedding, which represents proteins as numerical vectors, is a crucial step in various learning-based protein annotation/classification problems, including gene ontology prediction, protein-protein interaction prediction, and protein structure prediction. However, existing protein embedding methods are often computationally expensive due to their large number of parameters, which can reach millions or even billions. The growing availability of large-scale protein datasets and the need for efficient analysis tools have created a pressing demand for efficient protein embedding methods. Results: We propose a novel protein embedding approach based on multi-teacher distillation learning, which leverages the knowledge of multiple pre-trained protein embedding models to learn a compact and informative representation of proteins. Our method achieves comparable performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KennthShang/MTDP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics

MethodsOntology