SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided   Clustering Codebook

Chenjing Ding; Chiyu Wang; Boshi Liu; Xi Guo; Weixuan Tang; Wei Wu

arXiv:2409.06105·cs.CV·September 11, 2024

SGC-VQGAN: Towards Complex Scene Representation via Semantic Guided Clustering Codebook

Chenjing Ding, Chiyu Wang, Boshi Liu, Xi Guo, Weixuan Tang, Wei Wu

PDF

Open Access

TL;DR

SGC-VQGAN introduces a semantic-guided clustering approach to improve vector quantization for complex scene representation, enhancing semantic consistency and codebook utilization without extra parameters, leading to state-of-the-art results.

Contribution

It proposes a novel Semantic Online Clustering method for vector quantization, addressing codebook collapse and imbalance, and integrates multi-level features for better scene representation.

Findings

01

Achieves state-of-the-art reconstruction quality.

02

Improves downstream task performance.

03

Addresses codebook collapse issues.

Abstract

Vector quantization (VQ) is a method for deterministically learning features through discrete codebook representations. Recent works have utilized visual tokenizers to discretize visual regions for self-supervised representation learning. However, a notable limitation of these tokenizers is lack of semantics, as they are derived solely from the pretext task of reconstructing raw image pixels in an auto-encoder paradigm. Additionally, issues like imbalanced codebook distribution and codebook collapse can adversely impact performance due to inefficient codebook utilization. To address these challenges, We introduce SGC-VQGAN through Semantic Online Clustering method to enhance token semantics through Consistent Semantic Learning. Utilizing inference results from segmentation model , our approach constructs a temporospatially consistent semantic codebook, addressing issues of codebook…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Time Series Analysis and Forecasting