De novo genomic analyses for non-model organisms: an evaluation of methods across a multi-species data set
Sonal Singhal

TL;DR
This study evaluates de novo genomic analysis methods for non-model organisms using a multi-species transcriptomic dataset, highlighting challenges and proposing improvements to enhance data accuracy and reliability.
Contribution
It introduces a pipeline for analyzing de novo transcriptomic data in non-model organisms and assesses its effectiveness with novel metrics and validation.
Findings
Careful data curation improves analysis accuracy
HTS is effective for non-model organism genomics with proper methods
Identifies key areas for methodological improvements
Abstract
High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis - from assembly to annotation to variant discovery - researchers have to distinguish technical artifacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic dataset data for a clade of lizards and constructing a pipeline to analyze these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Genetic Mapping and Diversity in Plants and Animals
