Learning biologically relevant features in a pathology foundation model   using sparse autoencoders

Nhat Minh Le; Ciyue Shen; Neel Patel; Chintan Shah; Darpan Sanghavi,; Blake Martin; Alfred Eng; Daniel Shenker; Harshith Padigela; Raymond Biju,; Syed Ashar Javed; Jennifer Hipp; John Abel; Harsha Pokkalla; Sean Grullon,; Dinkar Juyal

arXiv:2407.10785·eess.IV·December 18, 2024

Learning biologically relevant features in a pathology foundation model using sparse autoencoders

Nhat Minh Le, Ciyue Shen, Neel Patel, Chintan Shah, Darpan Sanghavi,, Blake Martin, Alfred Eng, Daniel Shenker, Harshith Padigela, Raymond Biju,, Syed Ashar Javed, Jennifer Hipp, John Abel, Harsha Pokkalla, Sean Grullon,, Dinkar Juyal

PDF

Open Access

TL;DR

This paper demonstrates that sparse autoencoders trained on pathology foundation model embeddings can extract interpretable, biologically relevant features that correlate with cell types and improve robustness, aiding clinical interpretability.

Contribution

The study shows that sparse autoencoders can uncover biologically meaningful features from pathology model embeddings, which are unique to pathology models and generalize across datasets.

Findings

01

SAE features correlate with cell type counts like plasma cells and lymphocytes.

02

Biological features emerge across the model's depth and improve robustness.

03

Biologically-grounded features generalize to out-of-domain datasets.

Abstract

Pathology plays an important role in disease diagnosis, treatment decision-making and drug development. Previous works on interpretability for machine learning models on pathology images have revolved around methods such as attention value visualization and deriving human-interpretable features from model heatmaps. Mechanistic interpretability is an emerging area of model interpretability that focuses on reverse-engineering neural networks. Sparse Autoencoders (SAEs) have emerged as a promising direction in terms of extracting monosemantic features from polysemantic model activations. In this work, we trained a Sparse Autoencoder on the embeddings of a pathology pretrained foundation model. We found that Sparse Autoencoder features represent interpretable and monosemantic biological concepts. In particular, individual SAE dimensions showed strong correlations with cell type counts such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies

MethodsSoftmax · Attention Is All You Need · Sparse Autoencoder