OmicsQ: a user-friendly platform for interactive quantitative omics data analysis
Xuan-Tung Trinh, André Abrantes da Costa, David Bouyssié, Adelina Rogowska-Wrzesinska, Veit Schwämmle

TL;DR
OmicsQ is a web-based platform that simplifies the analysis of complex omics data, offering interactive tools for researchers without programming skills.
Contribution
OmicsQ introduces an accessible, integrated platform for omics data analysis with robust statistical tools and external application integration.
Findings
OmicsQ handles missing data without imputation, preserving data integrity.
The platform integrates with external tools for statistical testing and pathway analysis.
OmicsQ is broadly applicable across domains and accessible via a browser-based interface.
Abstract
High-throughput omics technologies generate complex datasets with thousands of features that are quantified across multiple experimental conditions, but often suffer from incomplete measurements, missing values, and individually fluctuating variances. This requires analytical tools for accurate, deep and insightful biological interpretation, capable of dealing with a large variety of data properties and different amounts of completeness. Software capable of handling such data complexity and integrating with external applications for downstream analysis remains rare and mostly relies on programming-based environments, limiting accessibility for researchers without computational expertise. We present OmicsQ, an interactive, web-based platform designed to streamline quantitative omics data analysis. OmicsQ provides an intuitive, browser-based visualization interface that integrates…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —Fundação para a Ciência e Tecnologia
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Health, Environment, Cognitive Aging · Single-cell and spatial transcriptomics
1 Introduction
The rapid advancements in high-throughput omics technologies have generated an unprecedented volume of quantitative data, necessitating robust and scalable computational solutions for meaningful interpretation (Aebersold and Mann 2016). However, challenges such as missing values, batch effects, and proper data normalization complicate downstream analyses and can introduce biases in the biological interpretation if not handled correctly (Karpievitch et al. 2012).
Missing values frequently distort statistical inference in omics studies. Naïve imputation can impose artificial structure and reduce variance, thereby inflating false discovery rates (Karpievitch et al. 2012, Lazar et al. 2016). Analyses that operate directly on incomplete matrices avoid these biases, yet user-friendly, imputation-free tools remain scarce. Batch effects, systematic differences between acquisition runs, can mask genuine biological signals if left uncorrected (Leek et al. 2010). Widely used batch correction methods such as ComBat (Johnson et al. 2007) and limma’s removeBatchEffect (Ritchie et al. 2015) are effective, but they may over-adjust when batch and biology are confounded. Both methods extend beyond their microarray origins to general quantitative data. Normalization counters non-biological biases. Whether using global scaling, LOWESS regression or ANOVA-based models, one must strike a balance: to remove unwanted variation without suppressing true signal (Karpievitch et al. 2012). The detection of batch effects (Akulenko et al. 2016) and interactive visualization can show the necessity and impact of different normalization and batch correction methods and thus help with their careful application.
Given the myriad of tools for quantitative analysis, and particularly their availability as libraries of scripting languages like R and Python, end users without programming skills often rely on simplified solutions. Such software frequently omits recent and more powerful methods for data treatment, creating accessibility barriers for experimental researchers, and providing only a single, rigid workflow. Moreover, instead of relying on a one-workflow-fits-all solution, end users often prefer to interact with the data to test different hypotheses and to confirm a priori knowledge about the given biological system.
Similar approaches like Analyst Suites (Shah et al. 2020, Zhang et al. 2023, Hsiao et al. 2024) target downstream analysis of proteomics data but are tied to specific upstream pipelines. DAPAR/ProStaR (Wieczorek et al. 2017) provide a Shiny GUI around a rich proteomics toolkit including normalization, imputation, and differential testing, but analyses typically proceed on imputed data. Perseus (Tyanova et al. 2016) offers a broad desktop environment for proteomics with normalization, pattern recognition, imputation and time-series modules. mixOmics (Rohart et al. 2017) focuses on multivariate multi-omics integration and feature selection within R, rather than end-to-end QC and preprocessing in a GUI. MetaboAnalyst (Pang et al. 2024) is a mature web platform primarily for metabolomics with extensive normalization and imputation options. We believe that current tools lack modularity and the ability to extensively interrogate the data using modern statistical approaches including variance-sensitive clustering and analysis at the protein complex level.
OmicsQ is designed to address these challenges by offering an intuitive, interactive, browser-based platform that streamlines key data-processing steps and exports results to external applications for downstream analyses such as statistical testing, variance-sensitive clustering, and protein quantification. By integrating automated experimental design annotation, direct handling of missing values, and interactive quality control visualization, OmicsQ empowers researchers to focus on biological interpretation rather than overcoming computational hurdles. Additionally, OmicsQ seamlessly integrates with external applications for imputation-free statistical testing, clustering, and protein complex analysis, enabling a comprehensive workflow from data import to functional insights. This includes clustering applicable to incomplete data and statistical testing that includes a presence-absence model. By bridging the gap between computational complexity and usability, OmicsQ provides an accessible and reliable solution for quantitative omics data analysis.
2 Technical methods
OmicsQ was implemented as an interactive, user-friendly interface built with the Shiny framework, featuring real-time feedback through data summaries, projection onto principal components and correlation plots. This allows users to directly observe the impact of data-processing choices, and thus should enable informed decisions before continuing with in-depth analysis such as statistical testing and variance-sensitive clustering (Fig. 1).
Overview of OmicsQ features. Diagram illustrating the key steps in the OmicsQ workflow: data upload, experimental design annotation, pre-processing (including batch correction and missing value handling), and quality control, and interaction with PolySTest, VSClust and ComplexBrowser.
By using an interactive interface with automatic data property detection and extensive help features, OmicsQ aims to provide clear guidance through critical analysis steps, beginning with file import. Upon uploading Excel or CSV/TSV tables containing features quantified across different samples, such as experimental conditions and replicates, OmicsQ automatically detects file formats, and permits manual adjustments when necessary. Users then assign specific columns of the uploaded data matrix as identifiers or quantitative measurements for downstream analysis.
Annotation of experimental sample groups and replicates can be a cumbersome task when dealing with many samples. OmicsQ streamlines annotation through semi-automatic comparison of sample names using multiple string-distance metrics (e.g. Levenshtein, Jaro-Winkler). This automatically creates separate groups of samples with similar names from the file header. Users can interactively adjust thresholds to optimize grouping accuracy rapidly, significantly simplifying experimental design annotation. Samples with consistent naming are easily recognized and grouped with minimal manual effort, reducing the need for time-consuming and error-prone manual annotations across tens or even hundreds of samples.
Advanced pre-processing options include direct handling of missing values without requiring imputation, which would add potentially incorrect values to the dataset. Batch effects can be detected by calcBatchEffects method from the BEclear package (Akulenko et al. 2016). Users can then choose whether to perform batch correction in OmicsQ with two commonly used methods from the ComBat and limma R packages (Ritchie et al. 2015), and visually evaluate using an integrated principal component analysis (PCA) plot and a correlation heatmap that can be downloaded as pdf files. Furthermore, there are different options for filtering, summarization of main features and normalization. OmicsQ provides a summary of the dataset and its main features such as balancing, potential batch effects and dynamic range.
3 Features and external tools
A significant feature of OmicsQ is its integration with external specialized statistical analysis applications. These tools are specifically tailored to address the challenges posed by high variability and limited statistical power in omics data, being able to run without requiring data completeness or relying on imputation methods based on stringent assumptions.
To facilitate seamless interaction among these tools, OmicsQ uses JavaScript messaging with these applications. The default settings call the publicly available instances on https://computproteomics.bmb.sdu.dk. Alternatively, one can direct the in-depth analysis to local installations. Docker-based containerization of all five Shiny apps ensures straightforward local deployment.
The following methods and their Shiny applications are integrated with OmicsQ:PolySTest (Schwämmle et al. 2020) performs multi-group comparisons on incomplete matrices. Its Miss Test uses missingness patterns to detect differentially regulated features without imputation, allowing statistically rigorous hypothesis testing across standard experimental designs. PolySTest is also available as a Bioconductor R package.VSClust (Schwämmle and Jensen 2018) is a variance-sensitive clustering algorithm that identifies co-regulated features even in noisy, partially missing datasets. Distances are computed against complete cluster centres, and integrated variance estimation improves sensitivity to subtle expression trends.ComplexBrowser (Michalak et al. 2019) quantifies known protein complexes from subunit co-abundance. OmicsQ auto-converts gene or protein identifiers via the UniProt API (UniProt Consortium 2025), extending the method beyond proteomics to transcriptomics and other genomics-based data types.CoExpresso (Chalabi et al. 2019) (invoked from ComplexBrowser for human datasets) benchmarks observed co-regulation against >200 ProteomicsDB (Lautenbacher et al. 2022) tissues and >3000 cancer samples in PDC (Thangudu et al. 2024). The recent extension to include PDC allows more extensive complex validation and new hypothesis generation.
With the integration of above applications, users can seamlessly transfer pre-processed data directly to PolySTest for statistical testing, VSClust for variance-based clustering, and ComplexBrowser for protein-complex analysis. When dealing with human data, the general behavior of protein complexes can further be investigated in CoExpresso. Results from statistical testing and clustering can be retrieved and integrated back into OmicsQ, consolidating analyses within a unified environment. Furthermore, OmicsQ uses the String API (Szklarczyk et al. 2019) to run functional enrichment and visualize the respective protein–protein interaction networks.
With its interactivity and modularity, OmicsQ supports flexible, interactive and user-friendly quantitative analysis applicable to diverse quantitative omics studies, such as proteomics, transcriptomics, metabolomics, and associated data types. Moreover, the software features a tutorial including a representative workflow exemplifying the entire analysis.
4 Discussion and conclusion
OmicsQ offers an interactive and accessible platform that can be operated entirely in a web browser. This lowers the barrier for experimental researchers, clinicians, and students who may lack coding experience but still need to perform high-quality, quantitative omics analyses with confidence. For instance, a biologist analyzing proteomics data from a small cohort of patient samples can quickly upload their dataset, correct batch effects, inspect data variability, and proceed to pathway enrichment, without needing to write a single line of code. Unlike many other tools, OmicsQ does not rely on imputation, and thus preserves data integrity for more reliable downstream analyses, as it does not add potentially incorrect information and lead to misleading results from a significance analysis.
The platform is designed to promote exploratory data analysis, enabling users to iteratively test alternative hypotheses and make informed decisions throughout their workflow. The integration with multiple external applications allows for flexible extension into statistical testing, clustering, and protein complex analysis and will be maintained to upkeep with respective updates. Integration with additional external applications and community standards, e.g. for sample metadata, will be considered in the future.
A key strength of OmicsQ is its capacity to maintain a balance between automation and interactivity. Users can inspect each transformation step, review visual summaries (e.g. PCA, heatmaps), and trace parameter choices for enhanced reproducibility. The summary of processing steps ensures that users are aware of every operation performed during the analysis.
The interactive and modular setup of OmicsQ also allows integrated analysis of different omics types when the same samples are used for data acquisition. The robustness of combinatorial statistical testing in PolySTest and variance controlled clustering in VSClust allows analysis after aggregation of the results in the same data table, and so enables identification of common biological processes. Given the often very different nature of different omics technologies and difficulties in aligning their features, such applications will need to be executed carefully.
Despite its broad applicability, OmicsQ has some limitations. Scalability remains a challenge when dealing with very large datasets, both in terms of memory usage and response time of the interface. The current implementation is optimized for medium-sized datasets (tens to hundreds of samples), typical of many proteomics and transcriptomics studies. OmicsQ and its associated applications can process datasets with thousands of features at low computational cost and can analyze data with up to 100k features in under an hour on the server (AMD EPYC 9224 24-Core Processor, 256GB RAM). Work is ongoing to introduce more efficient data handling and scalable processing beyond simple multi-threading to improve performance.
In summary, OmicsQ provides a robust, user-friendly environment for analyzing quantitative omics data. It empowers researchers to carry out essential preprocessing, hypothesis testing, and functional interpretation tasks without requiring programming skills. By bridging usability with analytical rigor, OmicsQ contributes to more reproducible and insightful omics research workflows.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aebersold R , Mann M. Mass-spectrometric exploration of proteome structure and function. Nature 2016;537:347–55.27629641 10.1038/nature 19949 · doi ↗ · pubmed ↗
- 2Akulenko R , Merl M, Helms V et al B Eclear: batch effect detection and adjustment in DNA methylation data. P Lo S One 2016;11:e 0159921.27559732 10.1371/journal.pone.0159921 PMC 4999208 · doi ↗ · pubmed ↗
- 3Chalabi MH , Tsiamis V, Käll L et al Co Expresso: assess the quantitative behavior of protein complexes in human cells. BMC Bioinformatics 2019;20:17.30626316 10.1186/s 12859-018-2573-8PMC 6327379 · doi ↗ · pubmed ↗
- 4Hsiao Y , Zhang H, Li GX et al Analysis and visualization of quantitative proteomics data using Frag Pipe-analyst. J Proteome Res 2024;23:4303–15.39254081 10.1021/acs.jproteome.4c 00294 PMC 13142904 · doi ↗ · pubmed ↗
- 5Johnson WE , Li C, Rabinovic A et al Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 2007;8:118–27.16632515 10.1093/biostatistics/kxj 037 · doi ↗ · pubmed ↗
- 6Karpievitch YV , Dabney AR, Smith RD et al Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012;13:S 5.10.1186/1471-2105-13-S 16-S 5PMC 348953423176322 · doi ↗ · pubmed ↗
- 7Lautenbacher L , Samaras P, Muller J et al Proteomics DB: toward a FAIR open-source resource for life-science research. Nucleic Acids Res 2022;50:D 1541–52.34791421 10.1093/nar/gkab 1026 PMC 8728203 · doi ↗ · pubmed ↗
- 8Lazar C , Gatto L, Ferro M et al Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J Proteome Res 2016;15:1116–25.26906401 10.1021/acs.jproteome.5b 00981 · doi ↗ · pubmed ↗
