Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation
Andrea Giuseppe Di Francesco, Andrea Rubbi, Pietro Li\`o

TL;DR
This paper introduces PT-RAG, a novel retrieval-augmented generation framework tailored for predicting cellular responses to gene perturbations, effectively improving accuracy by learning relevant biological context.
Contribution
It extends retrieval-augmented generation to cellular biology, introducing a two-stage, differentiable retrieval method that enhances prediction accuracy across cell types and perturbation contexts.
Findings
PT-RAG outperforms baseline models on the Replogle-Nadig dataset.
Differentiable, cell-type-aware retrieval is crucial for accurate predictions.
Naive retrieval methods can significantly degrade performance.
Abstract
Predicting how cells respond to genetic perturbations is fundamental to understanding gene function, disease mechanisms, and therapeutic development. While recent deep learning approaches have shown promise in modeling single-cell perturbation responses, they struggle to generalize across cell types and perturbation contexts due to limited contextual information during generation. We introduce PT-RAG (Perturbation-aware Two-stage Retrieval-Augmented Generation), a novel framework that extends Retrieval-Augmented Generation beyond traditional language-model applications to cellular biology. Unlike standard RAG systems designed for text retrieval with pre-trained LLMs, perturbation retrieval lacks established similarity metrics and requires learning what constitutes relevant context, making differentiable retrieval essential. PT-RAG addresses this through a two-stage pipeline: first,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Cell Image Analysis Techniques · Single-cell and spatial transcriptomics
