# subMG automates data submission for metagenomics studies

**Authors:** Tom Tubbesing, Andreas Schlüter, Alexander Sczyrba

PMC · DOI: 10.1186/s13040-025-00453-w · BioData Mining · 2025-06-05

## TL;DR

subMG is a tool that automates and simplifies the submission of metagenomics data to public archives, encouraging more complete and consistent data sharing.

## Contribution

subMG introduces an automated, user-friendly tool for streamlined metagenomics data submission to the ENA.

## Key findings

- subMG reduces the time and expertise needed to submit metagenomics datasets by automating the process.
- The tool supports both command-line and GUI interfaces, making it accessible to a wide range of users.
- subMG encourages more comprehensive data sharing, which can enhance future meta-analyses and comparative studies.

## Abstract

Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.

subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.

By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.

## Full-text entities

- **Genes:** MAG (myelin associated glycoprotein) [NCBI Gene 4099] {aka GMA, S-MAG, SIGLEC-4A, SIGLEC4, SIGLEC4A, SPG75}
- **Species:** Homo sapiens (human, species) [taxon 9606], Enterobacteriaceae bacterium (species) [taxon 1849603]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12142852/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12142852/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12142852/full.md

---
Source: https://tomesphere.com/paper/PMC12142852