# A curated resource of chemolithoautotrophic genomes and marker genes for CO₂ fixation pathway prediction

**Authors:** Shuichi Kawashima, Yoko Okabeppu, Seiha Miyazawa, Natsuko Ichikawa, Hikaru Nagazumi, Yutaka Nishihara, Takeru Nakazato, Susumu Goto, Ken Kurokawa, Masaharu Ishii, Hiroshi Mori

PMC · DOI: 10.1038/s41597-026-06655-z · 2026-02-11

## TL;DR

This paper introduces a new tool and dataset for identifying CO₂ fixation pathways in microbial genomes, improving accuracy for underrepresented pathways.

## Contribution

A curated dataset of chemolithoautotrophic genomes and a new tool, AutoFixMark, for predicting CO₂ fixation pathways with higher precision.

## Key findings

- AutoFixMark outperforms existing tools in precision and recall for underrepresented CO₂ fixation pathways.
- The curated dataset includes 347 genomes from 16 phyla and 15 well-characterized genomes for benchmarking.
- Publicly available resources include marker genes, prediction rules, and benchmark datasets.

## Abstract

Autotrophic CO₂ fixation is a fundamental metabolic process that enables microorganisms to inhabit carbon-limited environments. Multiple pathways mediate this process, with variants distributed across diverse taxa and some genes shared among pathways, making their identification from genomic data challenging. Here, we present a curated resource comprising pathway-specific KEGG Orthology marker genes and a lightweight, rule-based tool AutoFixMark for predicting the presence of seven known CO₂ fixation pathways in microbial genomes. To support marker gene identification and benchmarking, we compiled two reference datasets: (i) 347 manually curated chemolithoautotrophic genomes from 16 phyla, and (ii) a set of 15 well-characterized chemolithoautotrophic genomes used for defining pathway-specific marker genes. Using these marker genes, we developed AutoFixMark and evaluated its performance against two existing tools, METABOLIC and gapseq. Benchmarking results show that AutoFixMark achieves high precision and recall, particularly for pathways that are underrepresented in current tools. All curated gene sets, prediction rules, the AutoFixMark program, and benchmark datasets are publicly available, providing valuable resources for assessing autotrophic carbon fixation potential in microbial genomes.

## Full-text entities

- **Chemicals:** carbon (MESH:D002244), CO2 (MESH:D002245)

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12894919/full.md

---
Source: https://tomesphere.com/paper/PMC12894919