Hydropathy Conformational Letter and its Substitution Matrix HP-CLESUM: an Application to Protein Structural Alignment
Sheng Wang

TL;DR
This paper introduces a novel joint alphabet combining amino acid hydropathy and conformational letters, along with a substitution matrix, to improve protein structural alignment accuracy.
Contribution
It develops a hydropathy conformational letter (hp-CL) and a corresponding substitution matrix, enhancing protein structure comparison methods.
Findings
Optimized amino acid reduction improves information content.
The hp-CL matrix outperforms traditional systems in alignment accuracy.
Embedding hp-CL in alignment algorithms enhances performance on benchmark datasets.
Abstract
Motivation: Protein sequence world is discrete as 20 amino acids (AA) while its structure world is continuous, though can be discretized into structural alphabets (SA). In order to reveal the relationship between sequence and structure, it is interesting to consider both AA and SA in a joint space. However, such space has too many parameters, so the reduction of AA is necessary to bring down the parameter numbers. Result: We've developed a simple but effective approach called entropic clustering based on selecting the best mutual information between a given reduction of AAs and SAs. The optimized reduction of AA into two groups leads to hydrophobic and hydrophilic. Combined with our SA, namely conformational letter (CL) of 17 alphabets, we get a joint alphabet called hydropathy conformational letter (hp-CL). A joint substitution matrix with (17*2)*(17*2) indices is derived from FSSP.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Enzyme Structure and Function
