# TaoChongBao: a large-scale C. elegans missense variant database bridging worm and human genomes

**Authors:** Ming Li, Shimin Wang, Yongping Chai, Zhengyang Guo, Zi Wang, Zhe Chen, Kexin Lei, Jingyi Ke, Xingshen Huang, Guanghan Chen, Peng Huang, Kaiming Xu, Zijie Shen, Wei Li, Guangshuo Ou

PMC · DOI: 10.26508/lsa.202603631 · 2026-03-23

## TL;DR

TaoChongBao is a large database of C. elegans missense mutations that connects worm and human genomes for functional and comparative studies.

## Contribution

The paper introduces TaoChongBao, a 20-fold expanded C. elegans missense variant database integrating AlphaMissense and ClinVar for comparative genomics.

## Key findings

- Generated 12,069 viable C. elegans strains with 541,102 unique missense mutations.
- TaoChongBao integrates mutation data with AlphaMissense and ClinVar for functional and clinical insights.
- Database enables comparative analysis of conserved residues between C. elegans and human pathogenic sites.

## Abstract

Using EMS mutagenesis, we generated and sequenced 12,069 viable C. elegans strains, identifying 541,102 missense mutations. We developed TaoChongBao, an open-access database integrating mutation data, AlphaMissense, and ClinVar, greatly expanding functional variant resources for functional residuomics and comparative analyses.

We generated and sequenced 12,069 viable Caenorhabditis elegans strains produced by ethyl methanesulfonate mutagenesis, identifying 20,315,536 variants, including 541,102 unique missense mutations across 20,914 genes. Most strains exhibit resistance to the anti-nematode drug ivermectin, whereas some others display phenotypes like dumpy morphology, uncoordinated movement, multivulva formation, and blistered cuticle. To organize and visualize this resource, we developed TaoChongBao, an open-access database and strain repository that integrates C. elegans mutation data with AlphaMissense-predicted pathogenicity scores and ClinVar clinical annotations. TaoChongBao enables users to explore worm missense variants, identify conserved residues corresponding to human pathogenic sites, and access viable strains for experimental validation. Compared with the previous Million Mutation Project in C. elegans, TaoChongBao expands mutation coverage over 20-fold and emphasizes amino acid–altering variants. This resource provides a scalable platform for functional residuomics, variant interpretation, and comparative analyses between C. elegans and human genomes.

## Linked entities

- **Species:** Caenorhabditis elegans (taxon 6239)

## Full-text entities

- **Genes:** KIF1A (kinesin family member 1A) [NCBI Gene 547] {aka ATSV, C2orf20, HSN2C, MRD9, NESCAVS, SPG30}
- **Diseases:** uncoordinated movement (MESH:D009069), genetic disease (MESH:D030342), locomotor defects (MESH:D001523)
- **Chemicals:** chloroform (MESH:D002725), phenol (MESH:D019800), ethanol (MESH:D000431), HE (MESH:D006371), MgSO4 (MESH:D008278), ivermectin (MESH:D007559), agarose (MESH:D012685), KH2PO4 (-), TE (MESH:D013691), isoamyl alcohol (MESH:C029683), NaCl (MESH:D012965), EDTA (MESH:D004492), H2O (MESH:D014867), EMS (MESH:D005020), SDS (MESH:D012967)
- **Species:** Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Caenorhabditis elegans (species) [taxon 6239], Homo sapiens (human, species) [taxon 9606], C. elegans [taxon 328850]
- **Mutations:** R9Q, M9, C-to-A, R11Q, C for 2-3
- **Cell lines:** OP50 — Homo sapiens (Human), q11.2) BCR-ABL1, Cancer cell line (CVCL_DG77)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13009681/full.md

---
Source: https://tomesphere.com/paper/PMC13009681