DeepSeq: High-Throughput Single-Cell RNA Sequencing Data Labeling via Web Search-Augmented Agentic Generative AI Foundation Models
Saleem A. Al Dajani, Abel Sanchez, John R. Williams

TL;DR
This paper introduces DeepSeq, a web search-augmented agentic foundation model that automates single-cell RNA sequencing data labeling with high accuracy, enabling scalable, reliable biological data analysis.
Contribution
It presents a novel AI framework combining real-time web search with foundation models to automate and improve cell data annotation in genomics.
Findings
Achieves up to 82.5% labeling accuracy.
Automates large-scale cell annotation without manual curation.
Enables downstream tasks like cell-typing and perturbation prediction.
Abstract
Generative AI foundation models offer transformative potential for processing structured biological data, particularly in single-cell RNA sequencing, where datasets are rapidly scaling toward billions of cells. We propose the use of agentic foundation models with real-time web search to automate the labeling of experimental data, achieving up to 82.5% accuracy. This addresses a key bottleneck in supervised learning for structured omics data by increasing annotation throughput without manual curation and human error. Our approach enables the development of virtual cell foundation models capable of downstream tasks such as cell-typing and perturbation prediction. As data volume grows, these models may surpass human performance in labeling, paving the way for reliable inference in large-scale perturbation screens. This application demonstrates domain-specific innovation in health…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics
