# Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge

**Authors:** Ellis Franklin, Elise Billoir, Philippe Veber, Jérémie Ohanessian, Marie Laure Delignette-Muller, Sophie Martine Prud’homme

PMC · DOI: 10.1093/nargab/lqaf103 · NAR Genomics and Bioinformatics · 2025-07-30

## TL;DR

Cluefish is a new tool that helps interpret complex gene activity data by combining clustering and biological knowledge to uncover hidden patterns.

## Contribution

Cluefish introduces a novel workflow for transcriptomic data series using cluster-based over-representation analysis enhanced by biological prior knowledge.

## Key findings

- Cluefish identified gene clusters deregulated at low doses of dibutyl phthalate in zebrafish.
- The tool revealed retinoid signaling disruption as a sensitive pathway affected by dibutyl phthalate.
- Cluefish outperformed standard approaches by uncovering overlooked biological functions.

## Abstract

Interpreting transcriptomic data presents significant challenges, particularly in non-targeted approaches. While modern functional enrichment methods are well-suited for experimental designs involving two conditions, they are less applicable to data series. In this context, we developed Cluefish, a free and open-source, semi-automated R workflow designed for untargeted, comprehensive biological interpretation of transcriptomic data series. Cluefish applies over-representation analysis on pre-clustered protein–protein interaction networks, using clusters as anchors to identify smaller, more specific biological functions. Innovative features, including cluster merging and recovery of isolated genes through shared biological contexts, enable a more complete exploration of the data. We applied Cluefish to an in-house dataset with zebrafish exposed to a dose-gradient of dibutyl phthalate and to two published toxicology datasets featuring different organisms. Combined with DRomics, a tool for dose–response analysis—Cluefish identified gene clusters deregulated at low doses and linked to biological functions overlooked by the standard approach. Notably, it revealed that retinoid signaling disruption may be the most sensitive pathway affected by dibutyl phthalate during zebrafish development, potentially leading to morphological changes. The Cluefish workflow aims to provide valuable clues for biological hypothesis generation and experimental validation. It is freely available at https://github.com/ellfran-7/cluefish.

Graphical Abstract

## Linked entities

- **Chemicals:** dibutyl phthalate (PubChem CID 3026)
- **Species:** Danio rerio (taxon 7955)

## Full-text entities

- **Genes:** gc (GC vitamin D binding protein) [NCBI Gene 436841] {aka zgc:110389, zgc:92753}, cyp26b1 (cytochrome P450, family 26, subfamily b, polypeptide 1) [NCBI Gene 324188] {aka fc21d03, wu:fc21d03, wu:fc26h10, zgc:76999}, cyp26c1 (cytochrome P450, family 26, subfamily C, polypeptide 1) [NCBI Gene 554036] {aka cyp26b1l, cyp26d1}, cyp26a1 (cytochrome P450, family 26, subfamily A, polypeptide 1) [NCBI Gene 30381] {aka CYP26, P450RAI, cb24, id:ibd5061, wu:fb81e05}
- **Diseases:** toxicity (MESH:D064420), MCL (MESH:D003027), spine (MESH:D016135), spinal deformities (MESH:D013122), endocrine disruption (MESH:D004700), DR (MESH:D018746), inflammation (MESH:D007249)
- **Chemicals:** acetonitrile (MESH:C032159), cholesterol (MESH:D002784), ethylene (MESH:C036216), superoxide (MESH:D013481), PFOA (MESH:C023036), Retinol (MESH:D014801), Poly(A) (MESH:D011061), PHE (MESH:C031181), ATRA (MESH:D014212), 11-cis Retinal (MESH:D012172), PBS (MESH:D007854), nitrogen (MESH:D009584), H2O (MESH:D014867), Sphingolipid (MESH:D013107), BP (MESH:C038809), steroid (MESH:D013256), lipid (MESH:D008055), 9-cis RA (MESH:D000077556), PFA (MESH:C003043), dibutyl phthalate (MESH:D003993), phthalates (MESH:C032279), retinoid (MESH:D012176)
- **Species:** Populus trichocarpa (black cottonwood, species) [taxon 3694], Homo sapiens (human, species) [taxon 9606], Danio rerio (leopard danio, species) [taxon 7955], Populus x canadensis (Canadian poplar, species) [taxon 3690], Rattus (rat, genus) [taxon 10114], Mus musculus (house mouse, species) [taxon 10090], Rattus norvegicus (brown rat, species) [taxon 10116]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12309373/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12309373/full.md

## References

107 references — full list in the complete paper: https://tomesphere.com/paper/PMC12309373/full.md

---
Source: https://tomesphere.com/paper/PMC12309373