HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis
Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina, Almagro-P\'erez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew, F.K. Williamson, Ahrong Kim, Faisal Mahmood

TL;DR
HEST-1k is a comprehensive dataset linking spatial transcriptomics with histology images across multiple organs and species, enabling advanced computational analysis and benchmarking in tissue analysis.
Contribution
The paper introduces HEST-1k, a large, diverse dataset and accompanying tools that facilitate multimodal analysis of spatial transcriptomics and histology images, addressing previous limitations in scope and standardization.
Findings
Identified 2.1 million expression-morphology pairs
Mapped over 76 million nuclei across samples
Benchmarking foundation models for pathology
Abstract
Spatial transcriptomics enables interrogating the molecular composition of tissue with ever-increasing resolution and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology, as reflected by H&E-stained whole slide images (WSIs), encodes rich information often overlooked in ST studies. Here, we introduce HEST-1k, a collection of 1,229 spatial transcriptomic profiles, each linked to a WSI and extensive metadata. HEST-1k was assembled from 153 public and internal cohorts encompassing 26 organs, two species (Homo Sapiens and Mus Musculus), and 367 cancer samples from 25 cancer types. HEST-1k processing enabled the identification of 2.1 million expression--morphology pairs and over 76 million nuclei. To support its development, we additionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCancer-related molecular mechanisms research · Molecular Biology Techniques and Applications · Gene expression and cancer classification
MethodsLib
