SeqManager: A Web-Based Tool for Efficient Sequencing Data Storage Management and Duplicate Detection
Margot Celerie (IGH), Andrew Oldfield (IGH), William Ritchie (IGH)

TL;DR
SeqManager is a web-based tool that efficiently manages large sequencing datasets by automatically detecting duplicates and removable intermediate files, reducing storage costs in genomics labs.
Contribution
It introduces a novel web application that automates sequencing data management, including duplicate detection and safe removal of intermediate files.
Findings
Fast performance across multiple labs
Low memory footprint
Effective duplicate and intermediate file detection
Abstract
Motivation: Modern genomics laboratories generate massive volumes of sequencing data, often resulting in significant storage costs. Genomics storage consists of duplicate files, temporary processing files, and redundant intermediate data. Results: We developed SeqManager, a web-based application that provides automated identification, classification, and management of sequencing data files with intelligent duplicate detection. It also detects intermediate sequencing files that can safely be removed. Evaluation across four genomics laboratory settings demonstrate that our tool is fast and has a very low memory footprint.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Cancer Genomics and Diagnostics · Genomics and Rare Diseases
