# Taxonomic-Level Protein Quantification in Metaproteomics Using a Biomass-Constrained Expectation–Maximization Approach

**Authors:** Gelio Alves, Mehdi B. Hamaneh, Aleksey Y. Ogurtsov, Yi-Kuo Yu

PMC · DOI: 10.1021/jasms.5c00332 · Journal of the American Society for Mass Spectrometry · 2026-01-15

## TL;DR

This paper introduces a new method to accurately quantify proteins from microbial communities using a modified algorithm that solves the shared peptide problem in metaproteomics.

## Contribution

The novel contribution is a biomass-constrained expectation–maximization algorithm integrated into the MiCId workflow to resolve taxon–protein quantification challenges.

## Key findings

- The algorithm accurately quantifies taxon–protein pairs in synthetic datasets with known species abundances.
- It effectively redistributes peptide counts among shared taxon–protein pairs in complex microbial datasets.
- Results from clinical stool datasets align with prior findings, confirming the method's accuracy in real-world microbiome analysis.

## Abstract

Microbiome communities are found across diverse environments
and
play critical roles in both ecosystem function and human health. Mass-spectrometry-based
metaproteomics provides a powerful means for directly identifying
and quantifying microbial proteins. However, its application is hindered
by the shared peptide problem, where peptides map to multiple proteins
across taxa, complicating taxon–protein quantification. To
address this challenge, we extend a previously published modified
expectation–maximization algorithm that incorporates taxonomic
biomass constraints into the Microorganism Classification and Identification
(MiCId) workflow. This enhanced expectation–maximization algorithm
is used to quantify taxon–protein pairs derived from clusters
of identified taxon–protein pairs, thereby enabling more accurate
quantification and representation of taxonomic-level proteomes. The
performance of the approach is evaluated using synthetic datasets
consisting of simple mixtures with known relative species abundances,
a more complex 24-species synthetic dataset, and a clinical human
stool microbiome dataset. It is shown that, in simple synthetic datasets,
fold changes computed for species–protein pairs closely match
the expected values and are consistent with those obtained from MaxQuant.
Using the 24-species synthetic dataset, we show that the algorithm
accurately redistributes peptide extracted ion count among taxon–protein
pairs that share peptides. Finally, analyzing the clinical stool microbiome
dataset, we demonstrate that MiCId’s results are accurate and
consistent with previously reported findings. These results demonstrate
the robustness of MiCId’s algorithm for quantifying taxon–protein
pairs in complex microbial communities. By resolving the shared peptide
problem, the method enables accurate representation of taxonomic-level
proteomes, thereby advancing the application of metaproteomics in
microbiome research.

## Full-text entities

- **Genes:** CFP (complement factor properdin) [NCBI Gene 5199] {aka BFD, PFC, PFD, PROPERDIN}
- **Diseases:** obesity (MESH:D009765), CD (MESH:D003424), dental caries (MESH:D003731), IBD (MESH:D015212), neurodevelopmental disorders (MESH:D002658), type 2 diabetes (MESH:D003924), IDs (MESH:C535742), UC (MESH:D003093), periodontal disease (MESH:D010510), oral cancer (MESH:D009062), atherosclerosis (MESH:D050197), inflammatory (MESH:D007249), oral diseases (MESH:D009059)
- **Chemicals:** cysteine (MESH:D003545), DirectLFQ (-), carbohydrate (MESH:D002241), glucuronate (MESH:D020723), glycerol-3-phosphate (MESH:C029620)
- **Species:** Fibrobacter intestinalis (species) [taxon 28122], Agathobacter rectalis (species) [taxon 39491], Alistipes sp. (species) [taxon 1872444], Bacillus subtilis (species) [taxon 1423], Vibrio harveyi (species) [taxon 669], Blautia sp. (species) [taxon 1955243], Phocaeicola vulgatus (species) [taxon 821], Cellulophaga lytica (species) [taxon 979], Bordetella parapertussis (species) [taxon 519], Bacillus thuringiensis (species) [taxon 1428], Shigella flexneri (species) [taxon 623], Pseudomonas putida (species) [taxon 303], Roseovarius nubinhibens (species) [taxon 314263], Anaerobutyricum hallii (species) [taxon 39488], Salmonella bongori (species) [taxon 54736], Bifidobacterium sp. (species) [taxon 41200], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Phocaeicola sp. (species) [taxon 2773926], Phaeobacter inhibens (species) [taxon 221822], Pseudopedobacter saltans (species) [taxon 151895], Collinsella aerofaciens (species) [taxon 74426], Staphylococcus carnosus (species) [taxon 1281], Deinococcus proteolyticus (species) [taxon 55148], Roseobacter denitrificans (species) [taxon 2434], Bacillus cereus (species) [taxon 1396], Anaerostipes sp. (species) [taxon 1872530], Bacteroides thetaiotaomicron (species) [taxon 818], Escherichia coli (E. coli, species) [taxon 562], Allocoprococcus comes (species) [taxon 410072], gut metagenome (species) [taxon 749906], Marivirga tractuosa (species) [taxon 1006], Kineococcus radiotolerans (species) [taxon 131568], Sulfitobacter indolifex (species) [taxon 225422], Oceanicola granulosus (species) [taxon 252302], Deinococcus deserti (species) [taxon 310783], Homo sapiens (human, species) [taxon 9606], Deinococcus geothermalis (species) [taxon 68909], Philonthus vulgatus (species) [taxon 1896615], Sagittula stellata (species) [taxon 52603], Ruegeria pomeroyi (species) [taxon 89184], Faecalibacterium prausnitzii (species) [taxon 853]
- **Cell lines:** HM604 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_DQ08), HM541 — Homo sapiens (Human), Ataxia telangiectasia syndrome, Transformed cell line (CVCL_2563), HM609 — Homo sapiens (Human), Type 1 diabetes mellitus, Finite cell line (CVCL_GR91)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12879945/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12879945/full.md

## References

67 references — full list in the complete paper: https://tomesphere.com/paper/PMC12879945/full.md

---
Source: https://tomesphere.com/paper/PMC12879945