Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery
Xuemin Yu, Ankur Garg, Samira Ebrahimi Kahou, Hassan Sajjad

TL;DR
This paper introduces VQLC, a scalable vector quantization method for concept discovery in deep learning models, enabling efficient and interpretable explanations of model representations.
Contribution
VQLC leverages VQ-VAE architecture to provide a scalable alternative to clustering for concept discovery in large-scale datasets.
Findings
VQLC improves scalability over hierarchical clustering.
VQLC maintains comparable explanation quality.
VQLC enables efficient concept discovery in large models.
Abstract
Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on when making predictions. A promising line of post-hoc concept-based explanation methods relies on clustering token representations. However, commonly used approaches such as hierarchical clustering are computationally infeasible for large-scale datasets, and K-Means often yields shallow or frequency-dominated clusters. We propose the vector quantized latent concept (VQLC) method, a framework built upon the vector quantized-variational autoencoder (VQ-VAE) architecture that learns a discrete codebook mapping continuous representations to concept vectors. We perform thorough evaluations and show that VQLC improves scalability while maintaining comparable quality of human-understandable explanations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
