Deep Clustering with Associative Memories

Bishwajit Saha; Dmitry Krotov; Mohammed J. Zaki; Parikshit Ram

arXiv:2601.00963·cs.CV·January 6, 2026

Deep Clustering with Associative Memories

Bishwajit Saha, Dmitry Krotov, Mohammed J. Zaki, Parikshit Ram

PDF

Open Access 3 Reviews

TL;DR

This paper introduces DCAM, a novel deep clustering method that integrates representation learning and clustering through an energy-based loss function using Associative Memories, improving clustering quality across various architectures and data types.

Contribution

The paper presents a new loss function based on energy dynamics with Associative Memories, unifying representation learning and clustering in deep learning.

Findings

01

DCAM improves clustering quality across architectures.

02

Effective on image and text data.

03

Compatible with various neural network types.

Abstract

Deep clustering - joint representation learning and latent space clustering - is a well studied problem especially in computer vision and text processing under the deep learning framework. While the representation learning is generally differentiable, clustering is an inherently discrete optimization task, requiring various approximations and regularizations to fit in a standard differentiable pipeline. This leads to a somewhat disjointed representation learning and clustering. In this work, we propose a novel loss function utilizing energy-based dynamics via Associative Memories to formulate a new deep clustering method, DCAM, which ties together the representation learning and clustering aspects more intricately in a single objective. Our experiments showcase the advantage of DCAM, producing improved clustering quality for various architecture choices (convolutional, residual or…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 3

Strengths

The paper demonstrates superior results across various benchmarks, highlighting the robustness of the proposed approach. Additionally, the authors conduct experiments across diverse domains and architectural types, effectively showcasing the generalizability of their method. The preliminary section provides a thorough overview of foundational concepts, enhancing the accessibility of the paper. Furthermore, the appendix includes detailed hyperparameter information, essential for validating experi

Weaknesses

First and foremost, this paper resembles the CLAM paper strongly, particularly in the introduction section, which feels almost identical. Given that this work relies heavily on CLAM, it is essential to revise and reframe the introduction to clearly distinguish this approach as more than a direct application of CLAM with an autoencoder. Establishing this work's unique contributions will help clarify its originality and value. While the authors claim that the method does not require $\gamma$ tuni

Reviewer 02Rating 6Confidence 2

Strengths

- Paper is well written and easy to follow - Very clear motivation, theoretical analysis appear to be correct, core algorithm clearly explained, and experiments are presented thoroughly

Weaknesses

- The work builds up on ClAM (Saha et al., 2023), which may limit its novelty. - Arguably, since the work is concerned with deep clustering, baseline methods should include traditionally metric learning-based approaches. - In contemporary literature standards, both the neural network trained in the work and the dataset are small. It is unclear whether the proposed method would generalize to more realistic network sizes and datasets.

Reviewer 03Rating 3Confidence 4

Strengths

The research integrates autoencoder and pretraining while optimizing the data in the latent space using the CLAM algorithm. Which is a good extension to CLAM. The approach tries to maintain a minimal reconstruction loss during the process of finding clusters. The paper describes the technical details well which make it reproducible.

Weaknesses

The evaluation process only included SC although the ground truth labels are available for the used datasets. The SC alone can’t be enough especially with its underlying assumptions about the cluster’s distributions. The reported SCs for all the experiments are constrained by 10% range of change in the reconstruction loss. This is limiting the space for clustering improvement during training. Which is not fair for clustering algorithms, especially those that don’t use the decoder anymore in the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Ferroelectric and Negative Capacitance Devices · Advanced Neural Network Applications