Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance?
Yang Tan, Lirong Zheng, Bozitao Zhong, Liang Hong, Bingxin Zhou

TL;DR
This paper investigates whether embedding amino acid types always improves protein representation learning, demonstrating that a structure-only approach can outperform traditional methods in certain tasks.
Contribution
Proposes ProtLOCA, a local geometry alignment method based solely on amino acid structure, challenging the assumption that amino acid embedding always enhances performance.
Findings
ProtLOCA outperforms existing methods in global structure-matching tasks.
It provides a valid solution for local structure pairing among proteins with different structures.
Embedding amino acid types may not always improve deep learning models for protein analysis.
Abstract
Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning model learn better representation. To this end, we propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation. The effectiveness of ProtLOCA is examined by a global structure-matching task on protein pairs with an independent test dataset based on CATH labels. Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains. Furthermore, in local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · Genomics and Phylogenetic Studies
