Structured Contrastive Learning for Interpretable Latent Representations

Zhengyang Shen; Hua Tu; Mayue Shi

arXiv:2511.14920·cs.LG·November 20, 2025

Structured Contrastive Learning for Interpretable Latent Representations

Zhengyang Shen, Hua Tu, Mayue Shi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Structured Contrastive Learning (SCL), a novel framework that partitions latent space into interpretable groups, significantly improving robustness and interpretability of neural networks against transformations like phase shifts and rotations.

Contribution

The paper proposes SCL, a framework that explicitly structures latent spaces into semantic groups, enhancing robustness and interpretability without architectural changes.

Findings

01

ECG similarity improved from 0.25 to 0.91 under phase shifts

02

IMU rotation robustness achieved 86.65% accuracy

03

Outperforms traditional data augmentation methods

Abstract

Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

The shift-invariancy is an important problem for most of the temporal signals. Investigating this problem in self-supervised learning for temporal signals is novel.

Weaknesses

The paper lacks a thorough literature review. The references are minimal and do not include key related work. In particular, the authors should compare their approach with prior studies such as [1], which explore similar problems in supervised learning. The proposed method also shows limited novelty. Related works like [2] have already introduced the idea of separating latent spaces or embedding representations to capture invariant and variant factors. The paper does not clearly state how its c

Reviewer 02Rating 2Confidence 4

Strengths

1 This paper is well written and organized. 2 Extensive experiments are conducted.

Weaknesses

1. The authors' motivation to shift "from data augmentation to structured contrastive learning" relies primarily on empirical observations (such as ECG similarity decline), but lacks formal theoretical analysis or addresses unresolved gaps in the literature. For example, "laissez-faire representation learning" is described as the root cause, but the concept is neither formalized nor quantified, nor is its universality across different tasks verified. Therefore, the motivation section lacks rigor

Reviewer 03Rating 4Confidence 4

Strengths

1. The three-part feature partitioning is well-motivated and directly addresses practical challenges, such as phase sensitivity in ECG analysis. 2. The paper is clearly and effectively written, making the proposed method accessible. 3. The experiments demonstrate the method’s advantages over the selected baselines (and datasets).

Weaknesses

1. The paper overlooks key works in time series contrastive learning, such as T-Loss [1] and TS2Vec [2], which should be discussed and included as baselines for a comprehensive comparison. 2. The evaluation is limited to only two datasets, which is relatively narrow for a generalist conference like ICLR (see, for example, the experimental section of [1]). 3. The training process—specifically, how the task and contrastive losses are combined—lacks clarity, making reproducibility challenging. [1]

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Invasive Vital Sign Monitoring · Context-Aware Activity Recognition Systems · EEG and Brain-Computer Interfaces