TL;DR
C3R introduces a novel framework for microscopy image analysis that handles varying channel configurations, enabling unified evaluation across diverse datasets and improving cross-dataset generalization without retraining.
Contribution
The paper proposes the C3R framework, utilizing a context-concept channel grouping and a masked knowledge distillation approach for robust, channel-adaptive cell representations in microscopy imaging.
Findings
C3R outperforms existing benchmarks on ID and OOD tasks.
A simple implementation of C3R surpasses previous channel-adaptive methods on CHAMMI.
C3R enables cross-dataset generalization without dataset-specific retraining.
Abstract
Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* Improved training strategy: Masked Context Distillation with concept context segregation is intuitive and adds biological data structure prior to training to an extent. * Comprehensive evaluation and improved performance on several benchmark datasets (HPA, JUMP-CP, CHAMMI-ZS) in both in distribution and OOD settings.
* The assignment of concept or context groups are rather arbitrary or manually labeled subjectively. * Random channel masking across Student/Teacher (i.e. channel masking as an augmentation baseline) is missing and the contribution of MCD (masking context channels specifically) is unclear. * The contribution is very specific to IHC/fluorescence microscopy images and the subjective nature of context/concept selection makes it not generalizable and scalable to operate as a methodology. * CHAMMI
* The paper addresses a real issue in the field of image representation learning for microscopy images: modern image representation learning architectures are designed for natural RGB images, which always come in the three-channel format, where there is no significantly distinct information across channels. * The paper introduces a new and intuitive way to structure microscopy data by grouping channels into "context" and "concept". The model's success should speak to the power of grouping chann
* The paper claims that its framework is the first of its kind to demonstrate strong zero-shot evaluation for this problem, but I found the following paper that purports to do the same, as is also similarly channel adaptive [1]. * The model's ability to evaluate a new OOD dataset relies on the assumption that the new dataset's channels can also be separated into "context" and "concept" groups. While the authors note this is true for the most common public IHC datasets (HPA, JUMP-CP, WTC-11, etc
1. Important and relevant problem of fluorescence microscopy representation learning 2. Overall well-written and easy to read. 3. Conceptually original idea (context/concept channel separation) 4. Comprehensive ablation study
1. Conceptually, it is not so clear how to distinguish between context and concept channels. First, this distinction depends on the study design: the same channel might provide context for one assay and content for another. For instance, nuclei are context information for subcellular localization screens, while they provide the relevant readout when studying cell cycle, nucleoli, aneuploidy or cell death. 2. Related to this, I am not entirely convinced by the analysis and results provided in s
* The idea of grouping channels in context and concept is interesting, and biologically motivated. * Based on this idea, the paper proposes an inductive bias to the architecture of ViTs and model training. * The paper makes a good presentation of previous work. * The experiments use established datasets to evaluate performance and investigate the properties of the models.
The paper has several issues and the experimental results do not support the conclusions and claims of this paper. ### Context / Concept * The main technical limitation of the proposed approach is that the definition of context and concept channels needs to be manually defined. The decision may be very arbitrary in practice. * The quantification of context and concept seems to match the hypothesis, but it is not 100% supported by the data (as discussed in the Supplementary material). The sepa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
