SOM-VQ: Topology-Aware Tokenization for Interactive Generative Models
Alessandro Londei, Denise Lanzieri, and Matteo Benati

TL;DR
SOM-VQ introduces a topology-aware tokenization method that enhances interpretability and human control in discrete generative models by preserving semantic neighborhood structure in the token space.
Contribution
It combines vector quantization with Self-Organizing Maps to create discrete codebooks with explicit low-dimensional topology, enabling intuitive manipulation and improved learnability.
Findings
Produces more learnable token sequences in evaluated domains.
Enables intuitive human-in-the-loop control through geometric manipulation.
Demonstrates effective control in human motion generation tasks.
Abstract
Vector-quantized representations enable powerful discrete generative models but lack semantic structure in token space, limiting interpretable human control. We introduce SOM-VQ, a tokenization method that combines vector quantization with Self-Organizing Maps to learn discrete codebooks with explicit low-dimensional topology. Unlike standard VQ-VAE, SOM-VQ uses topology-aware updates that preserve neighborhood structure: nearby tokens on a learned grid correspond to semantically similar states, enabling direct geometric manipulation of the latent space. We demonstrate that SOM-VQ produces more learnable token sequences in the evaluated domains while providing an explicit navigable geometry in code space. Critically, the topological organization enables intuitive human-in-the-loop control: users can steer generation by manipulating distances in token space, achieving semantic alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Human Motion and Animation · 3D Shape Modeling and Analysis
