MOC-RVQ: Multilevel Codebook-Assisted Digital Generative Semantic Communication
Yingbin Zhou, Yaping Sun, Guanying Chen, Xiaodong Xu, Hao Chen,, Binhong Huang, Shuguang Cui, Ping Zhang

TL;DR
This paper introduces MOC-RVQ, a multilevel semantic communication system that combines a multi-head octonary codebook and residual vector quantization with a Swin Transformer-based noise reduction, achieving high efficiency and quality in image transmission.
Contribution
The paper proposes a novel multilevel generative semantic communication framework with a two-stage training process, integrating a multi-head octonary codebook, RVQ, and a Swin Transformer-based noise reduction for improved image transmission.
Findings
Outperforms conventional methods like BPG and JPEG in image quality.
Achieves comparable performance to analog JSCC with significantly lower bandwidth.
Requires only one-sixth of the channel bandwidth ratio for high-quality transmission.
Abstract
Vector quantization-based image semantic communication systems have successfully boosted transmission efficiency, but face challenges with conflicting requirements between codebook design and digital constellation modulation. Traditional codebooks need wide index ranges, while modulation favors few discrete states. To address this, we propose a multilevel generative semantic communication system with a two-stage training framework. In the first stage, we train a high-quality codebook, using a multi-head octonary codebook (MOC) to compress the index range. In addition, a residual vector quantization (RVQ) mechanism is also integrated for effective multilevel communication. In the second stage, a noise reduction block (NRB) based on Swin Transformer is introduced, coupled with the multilevel codebook from the first stage, serving as a high-quality semantic knowledge base (SKB) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications · Fractal and DNA sequence analysis
MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Adam · Dropout · Linear Layer · Stochastic Depth · Absolute Position Encodings · Layer Normalization
