THCRL: Trusted Hierarchical Contrastive Representation Learning for Multi-View Clustering
Jian Zhu

TL;DR
This paper introduces THCRL, a novel multi-view clustering method that improves fusion trustworthiness by addressing noise and structural information, achieving state-of-the-art results.
Contribution
The paper proposes a new framework with DSHF and AKCL modules to enhance multi-view fusion reliability and clustering accuracy.
Findings
Achieves state-of-the-art performance in deep multi-view clustering
Effectively handles noise in individual views
Improves representation alignment within clusters
Abstract
Multi-View Clustering (MVC) has garnered increasing attention in recent years. It is capable of partitioning data samples into distinct groups by learning a consensus representation. However, a significant challenge remains: the problem of untrustworthy fusion. This problem primarily arises from two key factors: 1) Existing methods often ignore the presence of inherent noise within individual views; 2) In traditional MVC methods using Contrastive Learning (CL), similarity computations typically rely on different views of the same instance, while neglecting the structural information from nearest neighbors within the same cluster. Consequently, this leads to the wrong direction for multi-view fusion. To address this problem, we present a novel Trusted Hierarchical Contrastive Representation Learning (THCRL). It consists of two key modules. Specifically, we propose the Deep Symmetry…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1.This article is clearly structured, clearly expressed, and easy to understand. 2.The method proposed in this paper is an improvement over some existing methods.
1.The two key factors claimed in the paper, 1. the intrinsic noise of individual views, and 2. CL at the Sample Level, have been addressed in numerous papers. The authors clearly did not carefully investigate the latest progress in this field. For example, regarding the second key factor, please refer to the following papers. The problem of false negatives has been considered in a large number of works, but the paper does not analyze or mention the progress. [1] Li, Junnan, et al. “Prototypical
1. This paper has a clear organizational structure that facilitates reading, and provides thorough descriptions of both the motivation and experimental results. 2. The comparative experimental results in Tables 2 and 3 are comprehensive, and the ablation study results in Table 4 demonstrate that both modules contribute positively to improving clustering performance.
1. The paper's motivation and proposed methods lack innovation. The issue of view noise was already addressed in "Investigating and mitigating the side effects of noisy views for self-supervised clustering algorithms in practical multi-view scenarios" (2024 CVPR). The structural information of nearest neighbors within the same cluster was discussed in "Twin Contrastive Learning for Online Clustering" (2022 IJCV). Furthermore, the problem of untrusted fusion in unsupervised scenarios has been ext
1. This framework introduces a UNet-based Deep Symmetry Hierarchical Fusion with multi-denoising components, effectively isolating noise to enhance fusion trustworthiness. 2. The experiment demonstrates state-of-the-art clustering results on six datasets, outperforming eight existing SOTA methods and validating strong generalizability. 3. This study conducts comprehensive ablation experiments, confirming the necessity of DSHF and AKCL modules through significant performance drops when either com
1. This paper only analyzes the impact of loss weight and temperature, lacking systematic exploration of other key parameters like encoder stages or neighbor count. 2. UNet-based DSHF may increase computational complexity, but no comparison with lightweight MVC baselines is provided. 3. Although DSHF claims denoising capabilities, this paper does not include controlled denoising experiments to quantify performance under varying noise levels.
1. The DSHF module, through mechanisms such as the view attention network, channel attention network, and symmetric hierarchical fusion, effectively isolates and suppresses noise within the feature space. This achieves a more reliable fusion of multi-view data compared to traditional concatenation or weighted-sum approaches. 2. The core concept of the AKCL module is to enhance the representation similarity among samples within the same cluster, rather than merely focusing on different views of t
1. There is a significant overlap between this manuscript and "Trusted Mamba Contrastive Network for Multi-View Clustering" (Zhu et al., 2025), " Self-supervised Trusted Contrastive Multi-view Clustering with Uncertainty Refined" (Hu et al., 2025) in terms of motivation, proposed methodology (e.g., loss functions, model architecture, block diagrams), and the logical structure of the text, particularly in the Abstract and Introduction sections. The current work merely cites the aforementioned pap
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
