A novel analysis workflow for simultaneous parsing prokaryotic and eukaryotic microbial genes from metagenomes
Wei Zhang, Yanmei Zheng, Guomin Han, Xingbing He

TL;DR
This paper introduces a new workflow to accurately identify both prokaryotic and eukaryotic genes in metagenomic data, improving gene prediction quality and completeness.
Contribution
A novel analytical workflow combining MetaEuk and MetaGeneMark for simultaneous and improved prediction of prokaryotic and eukaryotic genes in metagenomes.
Findings
The new workflow increased predicted prokaryotic and viral gene counts by 14–18% compared to standalone prokaryotic predictors.
Validation showed the workflow produced longer, less fragmented genes with improved integrity in mixed metagenomes.
The approach maintains similar eukaryotic gene prediction performance to MetaEuk alone while enhancing prokaryotic gene detection.
Abstract
Accurately predicting coding genes from metagenomic samples containing a high proportion of eukaryotic content remains a significant challenge. Novel and reliable methods for the simultaneous prediction of prokaryotic and eukaryotic microbial genes are crucial to address this. We evaluated gene prediction accuracy of MetaGeneMark and MetaEuk using representative genomes from diverse organisms. Based on these findings, we developed an innovative analytical workflow. This approach involves an initial prediction of eukaryotic genes using MetaEuk, followed by the masking of these predicted eukaryotic genes and any co-identified partial prokaryotic genes using a custom Perl script. Remaining prokaryotic genes are then predicted from the masked metagenome using MetaGeneMark or metaProdigal. This integrated strategy achieved similar quantities and average lengths of eukaryotic genes compared…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Bacteriophages and microbial interactions
