TL;DR
This paper investigates the challenges of vector quantization in graph representation learning, identifies causes of codebook collapse, and proposes RGVQ, a regularization framework that improves codebook utilization and graph token diversity.
Contribution
It provides the first empirical analysis of codebook collapse in graph VQ, offers a theoretical understanding, and introduces RGVQ to enhance graph VQ performance.
Findings
RGVQ significantly improves codebook utilization.
RGVQ boosts performance across multiple graph tasks.
Codebook collapse occurs even with existing mitigation strategies.
Abstract
Vector Quantization (VQ) has recently emerged as a promising approach for learning discrete representations of graph-structured data. However, a fundamental challenge, i.e., codebook collapse, remains underexplored in the graph domain, significantly limiting the expressiveness and generalization of graph tokens.In this paper, we present the first empirical study showing that codebook collapse consistently occurs when applying VQ to graph data, even with mitigation strategies proposed in vision or language domains. To understand why graph VQ is particularly vulnerable to collapse, we provide a theoretical analysis and identify two key factors: early assignment imbalances caused by redundancy in graph features and structural patterns, and self-reinforcing optimization loops in deterministic VQ. To address these issues, we propose RGVQ, a novel framework that integrates graph topology and…
Peer Reviews
Decision·Submitted to ICLR 2026
The strengths of the article are : 1. A very good analysis of the limitations of VG for graphs as used before, with numerical experiments reporting the codebook perplexity and which is far from optimal in existing methods. 2. This analysis comes also with a nice theoretical insight (with Theorem 1) to explain that, in addition to insights about the self-reinforcing training which removes less-used codewords from the training and enforces this collapse by preventing unused codeword to be consi
The main weakness of the article are about the presentation, and here are the points which should be improved according to me: 1. For me, section 3 about the VQ construction was hard to follow. Specifically, the notation $z_i = \delta_j \mathbf{C}$ used for eq. (3) had me go back to referenced work to understand how the sg operator comes in to eq. (3). I think that presenting first the loss of eq. (4) with the sg operator and explaining then how sg works and why we need these three parts in th
- The motivation is clear. The paper explains that codebook collapse is systemic in Graph VQ and empirically shows large gaps between ideal and observed perplexity across datasets and codebook sizes. - Results show substantial perplexity gains. - The paper is well-written.
- The two building blocks (Gumbel–Softmax and contrastive regularization) are well known; the novelty lies in how they’re used for VQ in graphs and justified by the analysis. This is meaningful but not very novel. - This paper gives zero attention to mathematical notation. This paper uses indiscriminately all types of letters for all types of elements (sets, vectors, matrices, scalars). Even though one can understand, this hurts the soundness of the paper, and makes the paper not publishable at
1. Problem focus is well motivated. Treating collapse itself as the main object of study (instead of just reporting final accuracy) feels important for the emerging “graph as tokens” paradigm. The paper makes a decent case that collapse is systematic in graph VQ, not just an odd failure case. 2. A simple, generally pluggable fix. RGVQ is made of two ideas that our community already understand (Gumbel-Softmax instead of hard argmax, plus a structure-aware contrastive regularizer), and the paper
1. **Theory-to-method gap.** The theoretical part (Thm 1) gives a lower bound saying that nodes with similar features and local computation trees are very likely to get mapped to the same codeword under standard VQ. This motivates why graphs collapse. But the constants are not instantiated in a way that proves the bound is non-vacuous on real data, and the theorem analyzes hard VQ while the proposed method uses Gumbel-Softmax and a contrastive regularizer. There is no formal argument that these
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
