Widespread gene fusion artifacts in helminth genome annotations
Emma L. Collington, Andrew C. Doxey, Brendan J. McConkey, D. Moira Glerum

TL;DR
This study finds that many predicted fusion genes in helminth genomes are likely artifacts, not real genes, based on RNA-seq data analysis.
Contribution
The study identifies that most helminth-specific fusion genes are likely gene prediction artifacts, not true fusions.
Findings
Helminth-specific fusion genes show no RNA-seq expression correlation between fused domains.
These genes have longer interdomain regions and less RNA-seq coverage continuity.
They are not supported in de novo transcriptome assemblies, suggesting annotation errors.
Abstract
Current helminth genomes possess thousands of predicted fusion genes, encoding novel protein domain architectures that are unique to these species. To investigate this, we analyzed 20,313 two-domain proteins annotated in current helminth genomes, of which 10,297 are apparently unique to helminths, and used RNA-seq data from 20 species of helminth to examine their plausibility as true fusion genes. For comparison, we analyzed a set of 400 high confidence, evolutionarily conserved domain fusions that are present in both helminth and non-helminth species. Our analysis suggests that, in contrast to genuine fusion genes, the majority of helminth-specific fusion genes in the 20 species investigated are likely gene prediction artifacts based on several criteria: (1) they show a lack of correlation between RNA-seq derived expression levels of the first and second “fused” domains, as well as…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
