COBRAxy: constraint-based metabolic modeling in Galaxy

Francesco Lapi; Luca Milazzo; Lihao Lin; Isabella Cecilia Rizzo; Bruno Giovanni Galuzzi; Chiara Damiani

PMC · DOI:10.1093/bioinformatics/btaf670·December 19, 2025

COBRAxy: constraint-based metabolic modeling in Galaxy

Francesco Lapi, Luca Milazzo, Lihao Lin, Isabella Cecilia Rizzo, Bruno Giovanni Galuzzi, Chiara Damiani

PDF

Open Access

TL;DR

COBRAxy is a user-friendly tool for metabolic modeling that integrates with Galaxy, allowing accessible analysis of metabolic fluxes without requiring coding skills.

Contribution

COBRAxy introduces a Python-based, Galaxy-integrated tool for constraint-based metabolic modeling with features like flux sampling and transcriptomic integration.

Findings

01

COBRAxy enables flux sampling and integration of medium composition for refined predictions.

02

The tool provides a visual interface for comparing flux differences across biological samples.

03

COBRAxy makes advanced metabolic analysis accessible to non-programmers via the Galaxy platform.

Abstract

Metabolic network modeling is essential for understanding metabolic shifts occurring in complex physio-pathological processes. Currently, constraint-based modeling frameworks for metabolic networks primarily rely on Python or MATLAB libraries, requiring some coding skills. In contrast, more user-friendly tools lack essential features such as flux sampling or transcriptomic data integration. We introduce COBRAxy, a Python-based tool suite integrated into the Galaxy Project. COBRAxy enables constraint-based modeling and sampling techniques, allowing users to compute metabolic flux distributions for multiple biological samples. The tool also enables the integration of medium composition information to refine flux predictions. Additionally, COBRAxy provides a user-friendly interface for visualizing significant flux differences between populations on an enriched metabolic map. This…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

BRCA1

Proteins1

Species1

Homo sapiens(human · species)

Chemicals6

COBRAxy amino acids TCA lipid lactate nucleotide

Diseases2

breast cancer cancer

Figures1

Click any figure to enlarge with its caption.

Equations1

Funding4

—European Union—NextGenerationEU
—ELIXIRxNextGenerationIT
—National Biodiversity Future Center
—Google Summer of Code (GSoC)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMicrobial Metabolic Engineering and Bioproduction · Metabolomics and Mass Spectrometry Studies · Bioinformatics and Genomic Networks

Full text

1 Introduction

Metabolism is a complex, dynamic system regulated at multiple levels, where enzyme activity, substrate availability, and interdependent pathways create non-linear interactions that -omics data alone cannot fully capture. Understanding metabolic shifts in health and disease requires tools that integrate multi-omics data and provide accessible insights, even for researchers without programming expertise.

However, there is a lack of accessible computational frameworks that enable researchers, particularly those without a strong programming background, to apply advanced techniques such as constraint-based modeling and sampling. State of the art libraries, such as COBRApy (Ebrahim et al. 2013) and COBRA Toolbox (Heirendt et al. 2019), require programming skills, which often limits their use in studying complex metabolic shifts in health and disease. On the contrary GUI-based tools such as Escher (King et al. 2015), Escher-FBA (Rowe et al. 2018), or FAME (Boele et al. 2012) do not enable integration of omics data into the constraint-based problem and do not support flux sampling, which is emerging as a powerful alternative to optimization methods (Herrmann et al. 2019). Other comprehensive and user-friendly solutions, such as CellNetAnalyzer (von Kamp et al. 2017) and CNApy (Thiele et al. 2022), offer extended functionality, but they also present certain limitations: the former requires a MATLAB installation and does not perform flux sampling, while the latter depends on Python and similarly lacks flux sampling capabilities.

In this context, the web-based bioinformatics platform Galaxy stands out as an ideal solution, promoting reproducible workflows and collaboration (The Galaxy Community 2024). Early efforts like MaREA (Graudenzi et al. 2018) and MaREA 2.0 (Ferrari et al. 2024) have leveraged Galaxy for metabolic studies, but accessible tools for constraint-based modeling and sampling remain scarce.

To address this need, we introduce the tool suite COBRAxy (Fig. 1). COBRAxy enables the integration of transcriptomics data with COBRA-based metabolic models, offering a comprehensive framework for studying metabolism in both health and disease. With COBRAxy, users can load and enrich metabolic models by incorporating transcriptomic data and adjusting the model’s medium conditions.

Using flux sampling, COBRAxy enables the estimation of metabolic flux distributions, offering valuable insights into reaction activity and flux patterns. Additionally, users can visualize statistically significant flux differences between populations on an enriched metabolic map.

By offering an intuitive and accessible platform for multi-omics integration and metabolic analysis, COBRAxy meets the growing need for tools that help researchers explore complex metabolic processes with ease.

2 COBRAxy

The first step, model preparation, allows users to upload genome-scale or context-specific metabolic models. With the tool Import Metabolic Model, users can load their own metabolic model and extract relevant information into a tabular file containing all the main data for further analysis, including gene-protein-reaction (GPR) rules, reaction bounds, and medium composition. The tool accepts an input file describing a metabolic network, including reactions, metabolites, genes, and bounds, for flux simulation. It supports the main formats commonly used for metabolic models, namely JSON (.json), SBML (.xml), MATLAB (.mat), and YAML (.yaml), as well as compressed versions (.zip, .gz, and.bz2). Notably, users should be aware that SBML files generated by MATLAB or other tools may not always be fully compatible with the Import Metabolic Model tool, which is based on COBRApy.

Pre-existing models, such as ENGRO2 (Di Filippo et al. 2022), or Recon3D (Brunk et al. 2018), can be used directly, bypassing the need to load a custom model, as they are already prepared for use in the subsequent tools. For completeness, we have also introduced the Export Metabolic Model tool, which allows users to export metabolic models from the tabular format into a standard model file format.

The next step involves quantifying the potential activity of the reactions in the metabolic model. The user can upload tabular data related to two or more transcriptomics profiles (single-cell or bulk) and map each gene expression profile to the corresponding metabolic model. This is achieved through Reaction Activity Scores (RAS) (Graudenzi et al. 2018), which are computed using the Expression2RAS tool. These scores are derived from gene expression data by leveraging the GPR rules embedded in the model. GPR rules link genes to reactions through Boolean expressions, where “OR” is interpreted as a sum (indicating redundancy) and “AND” as a minimum (indicating co-dependency). For each reaction, the RAS reflects the expression levels of its associated genes, as defined by these rules. The calculated RAS values are then normalized within the RAStoBounds tool. Normalization is performed by dividing each RAS value by the maximum RAS across all samples, ensuring that scores fall within a comparable range. These normalized RAS values are subsequently used to generate dynamic reaction bounds for the metabolic model. Specifically, the RAS values act as scaling factors to adjust the upper and lower bounds of each reaction, linking the model’s constraints directly to the underlying transcriptomics data. If a custom growth medium is specified, it can further refine the bounds by limiting exchange reactions based on the availability of extracellular metabolites.

COBRAxy graphical abstract.

Once set, these bounds define the model’s constraints, enabling exploration of the omics-customized region of feasible flux distributions. Additionally, users can export the transcriptionally derived metabolic models generated by RAStoBounds.

The Flux Simulation tool supports different state-of-the-art techniques to explore this region. They include optimization-based techniques, such as parsimonious-FBA (Lewis et al. 2010), Flux Variability Analysis (FVA) (Mahadevan and Schilling 2003), and sensitivity analysis via single reaction deletion, as well as flux sampling. Unlike optimization-based techniques, which provide a single optimal flux distribution, flux sampling generates multiple feasible solutions. This probabilistic approach captures flux variability and highlights alternative pathways, making it especially useful for studying complex metabolic systems. The tool can generate flux samples that satisfy the steady-state condition, meaning the system is balanced, using Corner-Based Sampling (CBS) (Galuzzi et al. 2024) or Improved Artificial Centering Hit-and-Run sampler (OPTGP) (Megchelenbrink et al. 2014). The number of flux samples is a customizable parameter: the user can specify how many flux solutions to sample and have the option to partition the sampled flux distribution into distinct batches. The tool provides summary statistics like mean, median, and quantiles for the sampled fluxes.

The final step uses the Metabolic Flux Enrichment Analysis tool to compare fluxes between two or more groups of samples. Users can provide the input in the form of a separated dataset of fluxes for each group or a unique dataset plus a file assigning each sample to a group. The tool supports three types of comparisons: comparing all possible pairs of groups (1 versus 1), comparing one group against all the others (1 versus All), or comparing each group individually against a control group (1 versus Control). Users can also select a metabolic map to visualize the results. This map must be consistent with the metabolic model and can be uploaded by the user.

The method performs a user-defined statistical test (e.g., the Kolmogorov–Smirnov test) to identify statistically significant differences, and calculates the fold change according to Equation (1):

[eqn]

where $[eqn]$ and $[eqn]$ are the average values of all the fluxes across the samples in the two comparison groups. By default, the P-value threshold is set to 0.05 and the fold change threshold to 1.2 (at least 20% difference), but these values can be adjusted by the user. Only reactions that pass both tests will be highlighted in the output map. The tool generates comparison maps based on the selected method. Arrows are colored sky blue or orange depending on whether the first group is down- or upregulated compared to the second one. If the flux values have opposite signs between two groups, indicating a different direction of usage of a reversible reaction, the corresponding arrow will be highlighted in red or blue. In addition, a map is provided for each group showing the average and median flux distributions. The color map is customizable by the user. In all maps, the arrow size varies: in the comparison maps, it is based on the fold change; in the average/median maps, it reflects the flux intensity.

A table is also provided for each comparison, containing the following values for each reaction: id, P-value, fold change [Equation (1)], z-score, average of the first group, and average of the second group.

Finally, the workflow enables seamless integration of results by exporting flux distributions, statistical analyses, and visualizations in various formats. The output is compatible with clustering tools, facilitating downstream analysis and interpretation.

3 Case study

To demonstrate the applicability of COBRAxy, we analyzed the TCGA‐BRCA breast cancer dataset released in (Ciriello et al. 2015), comprising RNA Seq V2 RSEM expression profiles of biopsies from 817 patients, with matched normal tissue profiles available for 112 of them, covering 20 440 genes.

The same dataset was previously used for data mapping with MaREA (Graudenzi et al. 2018), indicating a consistent upregulation of reactions in the glycolytic pathway but a less clear picture of mitochondrial pathways.

Using COBRAxy, we now integrated it into the constraints of the ENGRO2 metabolic model, which encompasses 395 metabolites, 469 reactions, and 498 genes.

The step-by-step tutorial of the analysis is reported in File 1, available as supplementary data at Bioinformatics online. The resulting map, reported as File 2, available as supplementary data at Bioinformatics online, provides a much more informative portray of the distinct metabolic programs of cancer and healthy cells.

Specifically, in cancer cells, the glycolytic flux and lactate production are significantly increased, consistent with the Warburg effect, indicating a shift toward aerobic glycolysis to meet the energy and anabolic demands of the tumor. The TCA cycle displays a non-canonical behavior in cancer cells, in line with experimental results reported in Arnold et al. (2022). Rather than functioning as a complete cycle, some reactions carry a sustained flux to support the biosynthesis of nucleotides, amino acids, and lipids, while others are reduced, reflecting a metabolic rewiring that aligns with the specific needs of tumor cells.

Additionally, the nucleotide and lipid biosynthesis pathways carry higher flux in cancer cells, underscoring their increased demand for these processes to support growth and proliferation.

This case study highlights the metabolic shifts that differentiate cancer cells from healthy cells and underscores the utility of COBRAxy in exploring complex metabolic systems.

4 Conclusion

The modular design and Galaxy integration of COBRAxy make it a versatile tool suite for investigating metabolic shifts in health, disease, and diverse environmental contexts. By integrating multi-omics data with constraint-based metabolic models, it allows for detailed insights into metabolic shifts. The case study demonstrated the tool’s ability to reveal distinct metabolic patterns between cancer and healthy cells, including altered glycolytic activity, non-canonical TCA cycle behavior, and upregulated biomass production pathways. These findings underscore the utility of COBRAxy in characterizing metabolic differences, offering a robust and accessible framework for researchers to explore and interpret metabolic dynamics in diverse biological contexts.

Supplementary Material

btaf670_Supplementary_Data

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arnold PK , Jackson BT, Paras KI et al A non-canonical tricarboxylic acid cycle underlies cellular identity. Nature 2022;603:477–81.35264789 10.1038/s 41586-022-04475-w PMC 8934290 · doi ↗ · pubmed ↗
2Boele J , Olivier BG, Teusink B et al Fame, the flux analysis and modeling environment. BMC Syst Biol 2012;6:8–5.22289213 10.1186/1752-0509-6-8PMC 3317868 · doi ↗ · pubmed ↗
3Brunk E , Sahoo S, Zielinski DC et al Recon 3d enables a three-dimensional view of gene variation in human metabolism. Nat Biotechnol 2018;36:272–81.29457794 10.1038/nbt.4072 PMC 5840010 · doi ↗ · pubmed ↗
4Ciriello G , Gatza ML, Beck AH et al; TCGA Research Network. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 2015;163:506–19.26451490 10.1016/j.cell.2015.09.033PMC 4603750 · doi ↗ · pubmed ↗
5Di Filippo M , Pescini D, Galuzzi BG et al Integrate: model-based multi-omics data integration to characterize multi-level metabolic regulation. P Lo S Comput Biol 2022;18:e 1009337.35130273 10.1371/journal.pcbi.1009337 PMC 8853556 · doi ↗ · pubmed ↗
6Ebrahim A , Lerman JA, Palsson BO et al Cobrapy: constraints-based reconstruction and analysis for python. BMC Syst Biol 2013;7:74–6.23927696 10.1186/1752-0509-7-74PMC 3751080 · doi ↗ · pubmed ↗
7Ferrari M, Lapi F, Penati L et al Integrated visualization of metabolomics and transcriptomics with Galaxy. In: 19th conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB) Benevento, Italy, 2024.
8Galuzzi BG , Milazzo L, Damiani C et al Adjusting for false discoveries in constraint-based differential metabolic flux analysis. J Biomed Inform 2024;150:104597.38272432 10.1016/j.jbi.2024.104597 · doi ↗ · pubmed ↗