# Predicting the pathway involvement of metabolites annotated in the MetaCyc knowledgebase

**Authors:** Erik D. Huckvale, Hunter N. B. Moseley

PMC · DOI: 10.1186/s12859-025-06358-z · 2026-01-07

## TL;DR

This paper trains machine learning models on the MetaCyc database to predict metabolite pathway associations, achieving performance comparable to KEGG.

## Contribution

The study demonstrates that MetaCyc can be used effectively for pathway prediction, with performance improvements in metabolic pathways.

## Key findings

- Models trained on MetaCyc achieved a mean MCC of 0.845 for pathway predictions.
- MetaCyc showed a 5.6% improvement in metabolic pathway prediction over KEGG.
- The results indicate MetaCyc can be used at state-of-the-art performance levels for pathway prediction.

## Abstract

The associations of metabolites with biochemical pathways are highly useful information for interpreting molecular datasets generated in biological and biomedical research. However, such pathway annotations are sparse in most molecular datasets, limiting their utility for pathway level interpretation. To address these shortcomings, several past publications have presented machine learning models for predicting the pathway association of small biomolecule (metabolite and xenobiotic) using data from the Kyoto Encyclopedia of Genes and Genomes (KEGG). But other similar knowledgebases exist, for example MetaCyc, which has more compound entries and pathway definitions than KEGG.

As a logical next step, we trained and evaluated multilayer perceptron models on compound entries and pathway annotations obtained from MetaCyc. From the models trained on this dataset, we observed a mean Matthews correlation coefficient (MCC) of 0.845 with 0.0101 standard deviation, compared to a mean MCC of 0.847 with 0.0098 standard deviation for the KEGG dataset. However, KEGG’s 184 metabolic-only pathway predictions (out of 502 total pathways) have a mean MCC of 0.800 with 0.021 standard deviation. Since MetaCyc pathways are metabolic focused, the MetaCyc results represent over a 5.6% improvement in metabolic pathway prediction performance.

These performance results are pragmatically the same, demonstrating that in aggregate, the 4055 MetaCyc pathways can be effectively predicted at the current state-of-the-art performance level.

The online version contains supplementary material available at 10.1186/s12859-025-06358-z.

## Full-text entities

- **Genes:** MCC (MCC regulator of Wnt signaling pathway) [NCBI Gene 4163] {aka MCC1}
- **Chemicals:** CPU (-), hydrogen (MESH:D006859), Glycan (MESH:D011134)

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12870939/full.md

---
Source: https://tomesphere.com/paper/PMC12870939