LG-VQ: Language-Guided Codebook Learning

Guotao Liang; Baoquan Zhang; Yaowei Wang; Xutao Li; Yunming Ye,; Huaibin Wang; Chuyao Luo; Kola Ye; linfeng Luo

arXiv:2405.14206·cs.CV·October 10, 2024·1 cites

LG-VQ: Language-Guided Codebook Learning

Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye,, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

PDF

Open Access 1 Video

TL;DR

LG-VQ introduces a language-guided framework for learning codebooks in vector quantization, aligning codes with text semantics to enhance multi-modal image synthesis and related tasks.

Contribution

It proposes a novel, model-agnostic language-guided codebook learning method with alignment modules, improving multi-modal task performance.

Findings

01

Superior reconstruction quality

02

Enhanced multi-modal task performance

03

Effective alignment of codes with text semantics

Abstract

Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal performance when the codebook is applied to multi-modal downstream tasks (\emph{e.g.}, text-to-image, image captioning) due to the existence of modal gaps. In this paper, we propose a novel language-guided codebook learning framework, called LG-VQ, which aims to learn a codebook that can be aligned with the text to improve the performance of multi-modal downstream tasks. Specifically, we first introduce pre-trained text semantics as prior knowledge, then design two novel alignment modules…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LG-VQ: Language-Guided Codebook Learning· slideslive

Taxonomy

TopicsNatural Language Processing Techniques