Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Xuemin Yu; Ankur Garg; Samira Ebrahimi Kahou; Hassan Sajjad

arXiv:2602.02726·cs.LG·February 4, 2026

Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery

Xuemin Yu, Ankur Garg, Samira Ebrahimi Kahou, Hassan Sajjad

PDF

Open Access

TL;DR

This paper introduces VQLC, a scalable vector quantization method for concept discovery in deep learning models, enabling efficient and interpretable explanations of model representations.

Contribution

VQLC leverages VQ-VAE architecture to provide a scalable alternative to clustering for concept discovery in large-scale datasets.

Findings

01

VQLC improves scalability over hierarchical clustering.

02

VQLC maintains comparable explanation quality.

03

VQLC enables efficient concept discovery in large models.

Abstract

Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare