MetaHQ: Harmonized, high-quality metadata annotations of public omics samples and studies
Parker Hicks, Lydia E Valtadoros, Christopher A Mancuso, Faisal Alquadoomi, Kayla A Johnson, Sneha Sundar, and Arjun Krishnan

TL;DR
MetaHQ provides a harmonized, high-quality metadata resource for public omics samples, enabling easier data discovery and integration across diverse sources through a comprehensive database and user-friendly CLI.
Contribution
It introduces MetaHQ, a unified database and CLI tool that standardizes and consolidates metadata annotations from multiple omics data sources.
Findings
Nearly 200,000 annotations from 13 sources integrated
Accessible via Python CLI and web resources
Facilitates improved data reuse and discovery in omics research
Abstract
Public omics databases like the Gene Expression Omnibus and the Sequence Read Archive offer substantial opportunities for data reuse to address novel biomedical questions. However, it is still difficult to find samples and studies of interest since they are described by free-text metadata and lack standardized annotations. To address this issue, multiple research groups have undertaken curation efforts to add standardized annotations to large collections of these data, but these annotations are fragmented across online resources and are stored in different formats subject to varying standardization criteria, hindering the integration of annotations across sources. We developed MetaHQ to harmonize and distribute standardized metadata for public omics samples. MetaHQ comprises a database with nearly 200,000 annotations from 13 sources and a user-friendly command-line interface (CLI) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks
