# A novel analysis workflow for simultaneous parsing prokaryotic and eukaryotic microbial genes from metagenomes

**Authors:** Wei Zhang, Yanmei Zheng, Guomin Han, Xingbing He

PMC · DOI: 10.7717/peerj.20769 · 2026-02-11

## TL;DR

This paper introduces a new workflow to accurately identify both prokaryotic and eukaryotic genes in metagenomic data, improving gene prediction quality and completeness.

## Contribution

A novel analytical workflow combining MetaEuk and MetaGeneMark for simultaneous and improved prediction of prokaryotic and eukaryotic genes in metagenomes.

## Key findings

- The new workflow increased predicted prokaryotic and viral gene counts by 14–18% compared to standalone prokaryotic predictors.
- Validation showed the workflow produced longer, less fragmented genes with improved integrity in mixed metagenomes.
- The approach maintains similar eukaryotic gene prediction performance to MetaEuk alone while enhancing prokaryotic gene detection.

## Abstract

Accurately predicting coding genes from metagenomic samples containing a high proportion of eukaryotic content remains a significant challenge. Novel and reliable methods for the simultaneous prediction of prokaryotic and eukaryotic microbial genes are crucial to address this. We evaluated gene prediction accuracy of MetaGeneMark and MetaEuk using representative genomes from diverse organisms. Based on these findings, we developed an innovative analytical workflow. This approach involves an initial prediction of eukaryotic genes using MetaEuk, followed by the masking of these predicted eukaryotic genes and any co-identified partial prokaryotic genes using a custom Perl script. Remaining prokaryotic genes are then predicted from the masked metagenome using MetaGeneMark or metaProdigal. This integrated strategy achieved similar quantities and average lengths of eukaryotic genes compared to using MetaEuk alone. Notably, the quantity of predicted prokaryotic genes and viral genes using the new workflow was 14–18% higher than that obtained with standalone prokaryotic predictors. Furthermore, validation on a mixed prokaryotic-eukaryotic metagenome demonstrated that our workflow yielded genes with significantly higher average lengths, indicating reduced fragmentation and improved gene integrity. This novel workflow effectively enables the rapid and comprehensive retrieval of high-quality prokaryotic and eukaryotic coding sequences from diverse metagenomes.

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12906264/full.md

---
Source: https://tomesphere.com/paper/PMC12906264