Using Deep Learning Models Pretrained by Self-Supervised Learning for Protein Localization
Ben Isselmann, Dilara G\"oksu, Heinz Neumann, Andreas Weinmann

TL;DR
This study demonstrates that self-supervised learning models pretrained on large datasets like ImageNet-1k and HPA FOV can effectively transfer to protein localization tasks in microscopy, even with minimal fine-tuning.
Contribution
It evaluates the generalizability of SSL models pretrained on domain-specific datasets for microscopy, highlighting their strong zero-shot and fine-tuned performance.
Findings
HPA FOV-pretrained DINO ViT achieves highest zero-shot macro F1 of 0.822.
Fine-tuning improves performance to macro F1 of 0.860.
Single-cell embeddings from HPA pretraining outperform others in k-NN tasks.
Abstract
Background: Task-specific microscopy datasets are often small, making it difficult to train deep learning models that learn robust features. While self-supervised learning (SSL) has shown promise through pretraining on large, domain-specific datasets, generalizability across datasets with differing staining protocols and channel configurations remains underexplored. We investigated the generalizability of SSL models pretrained on ImageNet-1k and HPA FOV, evaluating their embeddings on OpenCell with and without fine-tuning, two channel-mismatch strategies, and varying fine-tuning data fractions. We additionally analyzed single-cell embeddings on a labeled OpenCell subset. Result: DINO-based ViT backbones pretrained on HPA FOV or ImageNet-1k transfer well to OpenCell even without fine-tuning. The HPA FOV-pretrained model achieved the highest zero-shot performance (macro 0.822…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
