CruzDB: software for annotation of genomic intervals with UCSC genome-browser data
Brent S Pedersen, Ivana V Yang, Subhajyoti De

TL;DR
CruzDB is a fast, user-friendly tool that integrates genomic data with UCSC genome browser information, enabling new biological insights through analysis of genomic features and their context.
Contribution
It introduces CruzDB, a novel software that simplifies and accelerates the annotation of genomic intervals using UCSC data, facilitating integrative genomic analyses.
Findings
Exons replicate early, introns replicate late, indicating complex replication timing.
Variants linked to cognitive functions are found in relevant lincRNA transcripts.
Lamina-associated domains are enriched in olfaction-related genes.
Abstract
The biological significance of genomic features is often context-dependent. We present CruzDB, a fast and intuitive programmatic interface to the UCSC genome browser that facilitates integrative analyses of diverse local and remotely hosted datasets. We showcase the syntax of CruzDB using miRNA-binding sites as examples, and further demonstrate its utility with 3 novel biological discoveries. First, we find that while exons replicate early, introns tend to replicate late, suggesting a complex replication pattern in gene regions. Second, variants associated with cognitive functions map to lincRNA transcripts of relevant function. Third, lamina-associated domains are highly enriched in olfaction-related genes. CruzDB is available at https://github.com/brentp/cruzdb
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Cancer-related molecular mechanisms research · RNA modifications and cancer
