THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering

Jian Zhu

arXiv:2512.00368·cs.CV·December 11, 2025

THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering

Jian Zhu

PDF

Open Access 4 Reviews

TL;DR

This paper introduces THCRL, a novel multi-view clustering method that improves fusion trustworthiness by addressing noise and structural information, achieving state-of-the-art results.

Contribution

The paper proposes a new framework with DSHF and AKCL modules to enhance multi-view fusion reliability and clustering accuracy.

Findings

01

Achieves state-of-the-art performance in deep multi-view clustering

02

Effectively handles noise in individual views

03

Improves representation alignment within clusters

Abstract

Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However, a significant challenge remains: the problem of untrustworthy fusion. This problem primarily arises from two key factors: 1) Existing methods often ignore the presence of inherent noise within individual views; 2) In traditional MVC methods using Contrastive Learning (CL), similarity computations typically rely on different views of the same instance, while neglecting the structural information from nearest neighbors within the same cluster. Consequently, this leads to the wrong direction for multi-view fusion. To address this problem, we present a novel Trusted Hierarchical Contrastive Representation Learning (THCRL). It consists of two key modules. Specifically, we propose the Deep Symmetry…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

1.This article is clearly structured, clearly expressed, and easy to understand. 2.The method proposed in this paper is an improvement over some existing methods.

Weaknesses

1.The two key factors claimed in the paper, 1. the intrinsic noise of individual views, and 2. CL at the Sample Level, have been addressed in numerous papers. The authors clearly did not carefully investigate the latest progress in this field. For example, regarding the second key factor, please refer to the following papers. The problem of false negatives has been considered in a large number of works, but the paper does not analyze or mention the progress. [1] Li, Junnan, et al. “Prototypical

Reviewer 02Rating 2Confidence 5

Strengths

1. This paper has a clear organizational structure that facilitates reading, and provides thorough descriptions of both the motivation and experimental results. 2. The comparative experimental results in Tables 2 and 3 are comprehensive, and the ablation study results in Table 4 demonstrate that both modules contribute positively to improving clustering performance.

Weaknesses

1. The paper's motivation and proposed methods lack innovation. The issue of view noise was already addressed in "Investigating and mitigating the side effects of noisy views for self-supervised clustering algorithms in practical multi-view scenarios" (2024 CVPR). The structural information of nearest neighbors within the same cluster was discussed in "Twin Contrastive Learning for Online Clustering" (2022 IJCV). Furthermore, the problem of untrusted fusion in unsupervised scenarios has been ext

Reviewer 03Rating 6Confidence 4

Strengths

1. This framework introduces a UNet-based Deep Symmetry Hierarchical Fusion with multi-denoising components, effectively isolating noise to enhance fusion trustworthiness. 2. The experiment demonstrates state-of-the-art clustering results on six datasets, outperforming eight existing SOTA methods and validating strong generalizability. 3. This study conducts comprehensive ablation experiments, confirming the necessity of DSHF and AKCL modules through significant performance drops when either com

Weaknesses

1. This paper only analyzes the impact of loss weight and temperature, lacking systematic exploration of other key parameters like encoder stages or neighbor count. 2. UNet-based DSHF may increase computational complexity, but no comparison with lightweight MVC baselines is provided. 3. Although DSHF claims denoising capabilities, this paper does not include controlled denoising experiments to quantify performance under varying noise levels.

Reviewer 04Rating 2Confidence 4

Strengths

1. The DSHF module, through mechanisms such as the view attention network, channel attention network, and symmetric hierarchical fusion, effectively isolates and suppresses noise within the feature space. This achieves a more reliable fusion of multi-view data compared to traditional concatenation or weighted-sum approaches. 2. The core concept of the AKCL module is to enhance the representation similarity among samples within the same cluster, rather than merely focusing on different views of t

Weaknesses

1. There is a significant overlap between this manuscript and "Trusted Mamba Contrastive Network for Multi-View Clustering" (Zhu et al., 2025), " Self-supervised Trusted Contrastive Multi-view Clustering with Uncertainty Refined" (Hu et al., 2025) in terms of motivation, proposed methodology (e.g., loss functions, model architecture, block diagrams), and the logical structure of the text, particularly in the Abstract and Introduction sections. The current work merely cites the aforementioned pap

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Domain Adaptation and Few-Shot Learning · Face recognition and analysis