# Chlamy_ChloroPred: a deep learning-based, highly accurate binary classifier for chloroplast protein prediction in the model microalga, Chlamydomonas reinhardtii, with potential cross-proteome versatility

**Authors:** Hong Il Choi, Sung Ho Lee, Il Hyung Lee, Yong Jae Lee, Jin-Ho Yun, Dong-Yun Choi, Dae-Hyun Cho, Bum-Soo Shin, Junyoung Chun, Dong Won Lee, Hee-Sik Kim

PMC · DOI: 10.3389/fmicb.2026.1744805 · Frontiers in Microbiology · 2026-02-23

## TL;DR

Chlamy_ChloroPred is a deep learning tool that accurately predicts chloroplast proteins in Chlamydomonas reinhardtii and works well across species.

## Contribution

A new deep learning model for chloroplast protein prediction with high accuracy and cross-species versatility.

## Key findings

- Chlamy_ChloroPred achieved 0.8462 accuracy for C. reinhardtii, outperforming existing tools.
- The model showed 12.6% improvement over TargetP 2.0 when applied to Arabidopsis thaliana.
- It captures conserved features of chloroplast proteins across diverse photosynthetic lineages.

## Abstract

The chloroplast, a living relic of an ancient endosymbiotic interaction between a microalga and a microbe and the principal subcellular organelle responsible for biological CO2 assimilation, is emerging as a key target for research to enhance photosynthetic efficiency beyond its current limitations. Given that accurate protein localization is a prerequisite for the in-depth scientific investigation and practical application of the membrane-compartmentalized photosynthetic organelle, numerous computational prediction tools have been proposed, yet their accuracy remains unsatisfactory.

To address the limitation, we herein present Chlamy_ChloroPred, a newly developed deep learning-based framework composed of multi-layered artificial neural networks, carefully designed to perform binary classification of chloroplast proteins in the model photosynthetic microorganism, Chlamydomonas reinhardtii. The model captures locality-aware features of determinant amino acid residues in the chloroplast transit peptide (cTP), generally located within the ~50-amino-acid N-terminal region of mature chloroplast proteins, through the integration of ProtBERT-BFD embeddings, stacked bidirectional long short-term memory (BiLSTM) networks, and an attentive pooling layer.

Our model achieved an accuracy of 0.8462 for the C. reinhardtii proteome, outperforming widely used localization predictors, including TargetP 1.1 (0.4970), TargetP 2.0 (0.7396), and PredAlgo (0.7738) under a binary classification scheme. Comparative analyses further demonstrated that Chlamy_ChloroPred exhibits competitive performance relative to the current state-of-the-art model, PB-Chlamy (0.8521), under identical evaluation conditions. Notably, despite being trained solely on the algal proteome, Chlamy_ChloroPred showed substantial cross-species versatility when applied to the proteome of the terrestrial plant, Arabidopsis thaliana, achieving an accuracy of 0.7316 – representing a 12.6% improvement over TargetP 2.0, a predictor with previously demonstrated cross-proteome versatility. This likely stems from the model’s robust ability to capture conserved features of chloroplast proteins across proteomes from diverse photosynthetic lineages.

We developed a deep learning–based framework, Chlamy_ChloroPred, that integrates carefully designed neural layers with low computational complexity, achieving high predictive accuracy and interpretability. We believe that Chlamy_ChloroPred represents a compelling alternative to existing predictors, especially when accurate inference of chloroplast proteins is required.

## Linked entities

- **Species:** Chlamydomonas reinhardtii (taxon 3055), Arabidopsis thaliana (taxon 3702)

## Full-text entities

- **Genes:** GA1 (Terpenoid cyclases/Protein prenyltransferases superfamily protein) [NCBI Gene 828182] {aka ABC33, ARABIDOPSIS THALIANA ENT-COPALYL DIPHOSPHATE SYNTHETASE 1, ATCPS1, CPP synthase, CPS, CPS1}
- **Diseases:** TOC (MESH:C536164), D-HC (MESH:D014808), CP (MESH:D011488), BiLSTM (MESH:D000088562)
- **Chemicals:** CO2 (MESH:D002245), lipid (MESH:D008055), Amino acid (MESH:D000596), starch (MESH:D013213), Chlamy_ChloroPred (-)
- **Species:** Chlamydomonas reinhardtii (species) [taxon 3055], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12968240/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12968240/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12968240/full.md

---
Source: https://tomesphere.com/paper/PMC12968240