# Sequence-Based Prediction for Protein Solvent Accessibility

**Authors:** Yang Yang, Mengqi Chen, Congrui Liu, Mauno Vihinen

PMC · DOI: 10.3390/ijms26125604 · International Journal of Molecular Sciences · 2025-06-11

## TL;DR

This paper introduces SolAcc, a new tool that predicts which parts of a protein are exposed to the environment using only the protein's sequence.

## Contribution

A novel sequence-based predictor for amino acid accessibility using LSTM deep learning and 3D structure-derived features.

## Key findings

- The LSTM-based SolAcc outperformed existing predictors in blind tests.
- Features derived from 3D structures improved sequence-based accessibility prediction.
- SolAcc is freely available for use in protein function and structure studies.

## Abstract

When globular proteins fold into their characteristic three-dimensional structures, some amino acids are located on the surface, while others are situated in the protein core, where they cannot interact with molecules in the environment. Predicting the degree of solubility of amino acids provides insight into the function and relevance of residues. Residue accessibility is crucial for several protein functions, including enzymatic activity, allostery, multimer formation, binding to other molecules, and immunogenicity. We developed a novel sequence-based predictor for amino acid accessibility with features derived from three-dimensional protein structures. Several machine learning algorithms were tested, and the long short-term memory (LSTM) deep learning method demonstrated the best performance; thus, it was utilized to develop the freely available SolAcc tool. It showed superior performance compared to state-of-the-art predictors in a blind test.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, BTK (Bruton tyrosine kinase) [NCBI Gene 695] {aka AGMX1, AT, ATK, BPK, IGHD3, IMD1}
- **Diseases:** X-linked agammaglobulinemia (MESH:C537409), injury to (MESH:D014947), LSTM (MESH:D000088562)
- **Chemicals:** acid (MESH:D000143), ibrutinib (MESH:C551803), Tryptophan (MESH:D014364), A (MESH:D001151), W (MESH:D014414), AAindex (-), Proline (MESH:D011392), amino acid (MESH:D000596)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12193430/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12193430/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12193430/full.md

---
Source: https://tomesphere.com/paper/PMC12193430