Climbing the label tree: Hierarchy-preserving contrastive learning for medical imaging
Alif Elham Khan

TL;DR
This paper introduces a hierarchy-preserving contrastive learning framework for medical imaging that leverages taxonomic label structures to improve representation quality and interpretability.
Contribution
It proposes two plug-in objectives, HWC and LAM, that incorporate label hierarchy into contrastive learning without architectural changes, enhancing taxonomy alignment.
Findings
Consistently improves representation quality over strong SSL baselines.
Better respects the taxonomy with metrics like HF1 and H-Acc.
Effective even without curvature, especially when combined.
Abstract
Medical image labels are often organized by taxonomies (e.g., organ - tissue - subtype), yet standard self-supervised learning (SSL) ignores this structure. We present a hierarchy-preserving contrastive framework that makes the label tree a first-class training signal and an evaluation target. Our approach introduces two plug-in objectives: Hierarchy-Weighted Contrastive (HWC), which scales positive/negative pair strengths by shared ancestors to promote within-parent coherence, and Level-Aware Margin (LAM), a prototype margin that separates ancestor groups across levels. The formulation is geometry-agnostic and applies to Euclidean and hyperbolic embeddings without architectural changes. Across several benchmarks, including breast histopathology, the proposed objectives consistently improve representation quality over strong SSL baselines while better respecting the taxonomy. We…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The framework is shown to be effective as a "drop-in" for standard Euclidean pipelines and also benefits from hyperbolic geometry, especially on deeper trees. This makes the method broadly applicable. 2. The ablation study in Figure 2 and Section 4.6 effectively disentangles the HWC gain from a simple temperature-tuning effect, supporting the authors' claim about the importance of pair-specific, in-softmax reshaping
1. The abstract mentions "self-supervised learning (SSL)", but the proposed HWC and LAM methods are fully supervised, requiring fine-grained leaf labels and the full label tree during pretraining. This should be made clearer from the outset. While the paper states it's "compatible with SSL using view-positives only", this claim is not experimentally verified. 2. The method introduces a non-trivial number of new hyperparameters. While typical ranges are provided, a comprehensive sensitivity analy
The paper convincingly motivated by hierarchical medical labels. The proposed method is tested on multiple datasets and observe consistent performance gains.
There have been other related work that closely related to hierarchy-aware contrastive learning [1,2] that are not discussed in the manuscript, and some of the methods look similar to the proposed method. The authors might need to clarify how the proposed method differs from existing methods and compare with them quantitatively in the experiments if possible. Although this manuscript is motivated by medical imaging, the proposed method itself does not look medical-specific. The loss formulatio
S1. Problem formulation - Flat objectives ignore how “near” or “far” two labels are in a label tree. The paper makes the tree both a training signal and an evaluation target, and uses patient-level splits on medical sets (e.g., BreakHis) to avoid leakage. S2. Drop-in objectives - HWC and LAM slot into standard contrastive pipelines; swapping Euclidean vs. hyperbolic only changes the metric and mean operator, not the architecture or optimizer. This makes proposed method adaptable easily and can
W1. Marginal novelty of incremental nature - Pair-wise hierarchy shaping and prototype margins are known ideas; the main step is to put hierarchy scaling inside the softmax and combine it with level margins. Positioning against prior hierarchical contrastive/margin literature could be useful in my understanding. W2. Limited shift analysis - Medical data shift by site, stain, and magnification. The paper does not test cross-magnification or stain/center robustness, even though BreakHis spans 40
The paper is clearly written, with a precise problem formulation and compelling motivation. The proposed framework is conceptually sound and effectively highlights the limitations of existing SSL methods in medical images.
1. While the idea of hierarchical contrastive learning is intuitive, the method depends on datasets annotated at multiple granularity levels (e.g., both global categories and fine-grained subtypes), which restricts its general applicability. 2. The use of multiple level-wise prototypes introduces additional computational overhead—yet the paper lacks a detailed analysis of training efficiency. 3. There is no systematic discussion on how key hyperparameters (e.g., those mentioned around L250) are
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Medical Image Segmentation Techniques · Domain Adaptation and Few-Shot Learning
