Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA
Priyanka Ghosh, Kjiersten Fagnan, Ryan Connor, Ravinder Pannu, Travis J. Wheeler, Mihai Pop, C. Titus Brown, Tessa Pierce-Ward, Rob Patro, Jacquelyn S. Michaelis, Thomas L. Madden, Christiam Camacho, Olaitan I. Awe, Arianna I. Krinos, Ren\'e KM Xavier, Rodrigo Ortega Polo

TL;DR
This paper discusses the outcomes of a virtual codeathon focused on developing methods and benchmarks for petabyte-scale sequence searches in the SRA, aiming to enhance large-scale genomic data analysis.
Contribution
It introduces new benchmarking approaches and community resources for petabyte-scale sequence search in metagenomics, fostering scalable analysis methods.
Findings
Development of benchmarking approaches for large-scale metagenomic analysis
Creation of a public repository for reproducibility and community engagement
Identification of applications benefiting from SRA-wide sequence searches
Abstract
The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. The National Institutes of Health's (NIH) Sequence Read Archive (SRA), which is maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), is a rapidly growing public database that researchers use to drive scientific discovery across all domains of life. This increase in available data has great promise for pushing scientific discovery but also introduces new challenges that scientific communities need to address. As genomic datasets have grown in scale and diversity, a parade of new methods and associated software have been developed to address the challenges posed by this growth. These methodological advances are vital for maximally leveraging the power of next-generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Genomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies
MethodsLib
