# MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

**Authors:** Hua Shi, Yihang Lin, Dachen Liu, Quan Zou

PMC · DOI: 10.1371/journal.pcbi.1013397 · 2026-03-11

## TL;DR

MoCETSE is a new deep learning model that accurately predicts secreted effector proteins in Gram-negative bacteria, improving understanding of bacterial pathogenicity and antimicrobial strategies.

## Contribution

MoCETSE introduces a novel framework combining pre-trained language models with a target preprocessing network and relative positional encoding for effector protein prediction.

## Key findings

- MoCETSE outperforms existing tools in predicting effector proteins in Gram-negative bacteria.
- The model effectively captures long-range dependencies and key sequence motifs relevant to effector function.
- MoCETSE provides interpretable insights into the biological mechanisms of effector protein identification.

## Abstract

Identifying effector proteins of secretion systems in Gram-negative bacteria is crucial for deciphering their pathogenic mechanisms and guiding the development of antimicrobial strategies. Extracting evolutionary and sequence features using pre-trained protein language models (PLMs) has emerged as an effective approach to improve the performance of effector protein prediction. However, the high-dimensional features generated by PLMs contain extensive general biological information, making it difficult to focus on core features when applied directly to effector protein tasks, which in turn limits prediction performance. In this study, we propose MoCETSE, a deep learning model for predicting effector proteins in Gram-negative bacteria. Specifically, MoCETSE first extracts contextual representations of sequences using the pre-trained protein language model ESM-1b. Subsequently, it refines key functional features via a target preprocessing network to construct more expressive sequence representations. Finally, integrated with a transformer module incorporating relative positional encoding, MoCETSE explicitly models the relative spatial relationships between residues, enabling highly accurate prediction of secreted effector proteins. MoCETSE exhibits excellent and robust performance in both five-fold cross-validation and independent testing. Benchmark results demonstrate that it maintains strong competitiveness compared to existing binary and multi-class predictors. Additionally, the model can effectively perform genome-wide effector protein prediction, showing outstanding specificity and reliability. MoCETSE provides an efficient and robust computational framework for the accurate identification of bacterial effector substrates and offers key biological insights.

Secreted effector proteins are a class of key virulence factors in Gram-negative bacteria. After being injected into host cells, they interfere with normal cellular functions, leading to the development of diseases. Accurate identification of these virulence proteins is crucial for understanding bacterial pathogenic mechanisms and developing therapeutic strategies. However, existing methods suffer from issues such as feature redundancy and insufficient capture of long-range dependency signals. Here, we developed a novel computational framework called MoCETSE that enables end-to-end intelligent prediction of effector proteins directly from raw protein sequence information. The model leverages a pre-trained protein language model to extract deep biological information from raw sequences; a target preprocessing network then refines the extracted information to focus on features most relevant to effector protein identification. During the learning of secretion signal features, we introduced relative positional encoding to effectively capture associations between distant positions in the sequence. In cross-category prediction, MoCETSE outperformed tools such as DeepSecE. Furthermore, we provide interpretable biological mechanisms supporting the model, revealing which key sequence motifs and functional regions play core roles in distinguishing different types of effector proteins.

## Full-text entities

- **Genes:** RPE [NCBI Gene 19832229]
- **Diseases:** ESM-1b (MESH:C567213), Philadelphia 1 (MESH:D010677), infection (MESH:D007239)
- **Chemicals:** Gram (-)
- **Species:** Pseudomonas syringae (species) [taxon 317], Escherichia coli (E. coli, species) [taxon 562], Legionella pneumophila subsp. pneumophila (subspecies) [taxon 91891], Salmonella enterica subsp. enterica serovar Typhimurium (no rank) [taxon 90371], Legionella pneumophila (species) [taxon 446], Pseudomonas aeruginosa (species) [taxon 287], Pseudomonas syringae pv. tomato (no rank) [taxon 323], Shigella (genus) [taxon 620], Citrobacter rodentium (species) [taxon 67825], Pseudomonas aeruginosa PAO1 (strain) [taxon 208964], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Pseudomonas syringae pv. tomato str. DC3000 (strain) [taxon 223283]
- **Cell lines:** T4SE — Oncorhynchus keta (Chum salmon), Spontaneously immortalized cell line (CVCL_6D91), ESM-1b — Homo sapiens (Human), Transformed cell line (CVCL_XI05), Philadelphia 1 — Homo sapiens (Human), Childhood B acute lymphoblastic leukemia, Cancer cell line (CVCL_ZV70)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12998954/full.md

---
Source: https://tomesphere.com/paper/PMC12998954