# SpaFlow: a Nextflow pipeline for QC and clustering of MxIF datasets

**Authors:** Brenna C Novotny, Raymond Moore, Lynn Langit, David Haley, Rachel L Maus, Jun Jiang, Caitlin Ward, Ray Guo, Ellen L Goode, Svetomir N Markovic, Chen Wang

PMC · DOI: 10.1093/bioadv/vbaf032 · Bioinformatics Advances · 2025-02-14

## TL;DR

SpaFlow is a pipeline for analyzing MxIF data that improves cell clustering and quality control, enabling reproducible and scalable analysis of tissue microenvironments.

## Contribution

SpaFlow introduces a novel meta-clustering approach and integrates multiple clustering tools for MxIF data analysis.

## Key findings

- SpaFlow successfully identified biologically meaningful cell populations in ovarian tumor cores.
- The pipeline consistently identifies cell populations across matched regions of interest in serial tonsil sections.

## Abstract

Multiplex immunofluorescence (MxIF) enables the quantification of multiple protein markers at a single-cell level while preserving spatial information, offering a powerful tool for studying tissue microenvironments. However, the flexibility in MxIF panel design poses challenges in standardizing cell phenotyping.

We present SpaFlow, an efficient, customizable pipeline for unsupervised clustering and classification of MxIF data, implemented using Nextflow. SpaFlow performs quality control, clustering, and postclustering analysis on segmented and quantified MxIF data, facilitating reproducible and scalable analyses across various computing platforms. The SpaFlow pipeline integrates three clustering and classification packages—Seurat, SCIMAP, and CELESTA—each providing unique methodologies for identifying cell types based on phenotypic markers. A novel “meta-clustering” approach condenses clusters across multiple regions of interest into common meta-clusters, streamlining the cell-type identification process in large datasets. SpaFlow’s robust quality control steps, including signal summation and cell density filtering, mitigate artifacts that may impact clustering accuracy. We demonstrate the utility of SpaFlow in a case study involving 297 ovarian tumor cores, where SpaFlow successfully identified biologically meaningful cell populations, including tumor-infiltrating lymphocytes, efficiently and rapidly. Additionally, SpaFlow’s reproducibility is validated using serial tonsil sections, confirming its capability to consistently identify distinctive cell populations across matched ROIs.

SpaFlow is freely available with detailed documentation and examples at https://github.com/dimi-lab/SpaFlow.

## Linked entities

- **Diseases:** ovarian tumor (MONDO:0021068)

## Full-text entities

- **Diseases:** tumor (MESH:D009369), ovarian tumor (MESH:D010051)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11879158/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11879158/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC11879158/full.md

---
Source: https://tomesphere.com/paper/PMC11879158