# Fast and robust estimate of bacterial genus novelty using the percentage of conserved proteins with unique matches (POCPu)

**Authors:** Charlie Pauvert, Thomas C.A. Hitch, Thomas Clavel

PMC · DOI: 10.7717/peerj.20259 · PeerJ · 2025-11-14

## TL;DR

This paper introduces POCPu, a faster and more accurate method for bacterial genus classification using conserved proteins with unique matches.

## Contribution

The novel POCPu metric improves genus delineation by considering unique protein matches and requires lower computational resources.

## Key findings

- POCPu outperforms traditional POCP in distinguishing within-genus from between-genera comparisons.
- POCPu is 20x faster than BLASTP when using DIAMOND's very-sensitive setting.
- Family-specific POCPu thresholds are needed for accurate genus assignment.

## Abstract

Accurate taxonomic assignment of bacterial genomes is essential for identifying novel taxa and for stable classification to enable robust comparison between studies. Bacterial genus delineation relies on multiple lines of evidence, including phylogenetic trees and metrics like the percentage of conserved proteins (POCP). POCP is widely used, but requires benchmarking in terms of both, computation and accuracy. We used 2,358,466 pairwise comparisons of proteomes derived from 4,767 genomes across 35 families to systematically assess POCP calculation and percentage of conserved proteins with unique matches (POCPu) which considers unique matches only. Both methods are 20x faster than the reference BLASTP when using the very-sensitive setting of DIAMOND. However, POCPu differentiates better within-genus from between-genera values, which improves bacterial genus assignment. This work facilitates comparative analysis of an increasingly larger number of genomes, providing a reliable metric to support genus delineation. The findings suggest that specific POCPu thresholds deviating from the reference 50% value are needed for certain families.

## Full-text entities

- **Diseases:** POCP (MESH:D011488)
- **Chemicals:** POCPu (-), DIAMOND (MESH:D018130), carbon (MESH:D002244)
- **Species:** Enterobacteriaceae (enterobacteria, family) [taxon 543], Homo sapiens (human, species) [taxon 9606], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12622232/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12622232/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/PMC12622232/full.md

---
Source: https://tomesphere.com/paper/PMC12622232