# AnnSQL: a Python SQL-based package for fast large-scale single-cell genomics analysis using minimal computational resources

**Authors:** Kenny Pavan, Arpiar Saunders

PMC · DOI: 10.1093/bioadv/vbaf105 · Bioinformatics Advances · 2025-05-05

## TL;DR

AnnSQL is a Python package that uses SQL to enable fast and efficient analysis of large single-cell genomics datasets on regular computers.

## Contribution

AnnSQL introduces a new computational framework using SQL and DuckDB to significantly speed up single-cell genomics analysis.

## Key findings

- AnnSQL operations on a 4.4 million cell dataset ran in minutes on a laptop, while equivalent operations in AnnData or Seurat took ~700× longer on HPC clusters.
- AnnSQL enables large-scale single-cell analysis on personal computers with minimal computational resources.

## Abstract

As single-cell genomics technologies continue to accelerate biological discovery, software tools that use elegant syntax and minimal computational resources to analyze atlas-scale datasets are increasingly needed. Here, we introduce AnnSQL, a Python package that constructs an AnnData-inspired database using the in-process DuckDb engine, enabling orders-of-magnitude performance enhancements for parsing single-cell genomics datasets with the ease of SQL. We highlight AnnSQL functionality and demonstrate transformative runtime improvements by comparing AnnData or AnnSQL operations on a 4.4 million cell single-nucleus RNA-seq dataset: AnnSQL-based operations were executed in minutes on a laptop for which equivalent operations in AnnData or Seurat largely failed (or were ∼700× slower) on a high-performance computing cluster. AnnSQL lowers computational barriers for large-scale single-cell/nucleus RNA-seq analysis on a personal computer, while demonstrating a promising computational infrastructure extendable for complete single-cell workflows across various genome-wide measurements.

AnnSQL is a pip installable package that can be found at https://github.com/ArpiarSaundersLab/annsql along with documentation at https://docs.annsql.com.

## Full-text entities

- **Diseases:** Autism (MESH:D001321)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12098940/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12098940/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12098940/full.md

---
Source: https://tomesphere.com/paper/PMC12098940