Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads
Laurent Gautier, Ole Lund

TL;DR
This paper introduces a low-bandwidth, non-compute intensive system for identifying microbes from raw sequencing reads by querying a remote server with minimal data transfer, enabling efficient organism identification.
Contribution
It presents a novel approach that allows microbial identification without prior reference genome specification, using a distributed architecture for minimal data transfer and computation.
Findings
System can identify microbes with minimal data transfer
Implemented web server indexing thousands of genomes
Client can run in a web browser on modest devices
Abstract
Cheap high-throughput DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data in which the reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data, and the hints can be used for more computationally-demanding work. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references known to the server. Sequences for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
