Exploration into Translation-Equivariant Image Quantization

Woncheol Shin; Gyubok Lee; Jiyoung Lee; Eunyi Lyou; Joonseok Lee,; Edward Choi

arXiv:2112.00384·cs.CV·February 28, 2023

Exploration into Translation-Equivariant Image Quantization

Woncheol Shin, Gyubok Lee, Jiyoung Lee, Eunyi Lyou, Joonseok Lee,, Edward Choi

PDF

Open Access 2 Repos

TL;DR

This paper introduces a translation-equivariant image quantization method that enforces orthogonality among codebook embeddings, improving sample efficiency and accuracy in image and text generation tasks.

Contribution

It proposes a novel orthogonality-based approach to achieve translation-equivariance in image quantization, addressing aliasing issues in current methods.

Findings

01

Improves sample efficiency in image and text generation tasks.

02

Achieves up to +11.9% accuracy in text-to-image generation.

03

Enhances image-to-text generation accuracy by +3.9%.

Abstract

This is an exploratory study that discovers the current image quantization (vector quantization) do not satisfy translation equivariance in the quantized space due to aliasing. Instead of focusing on anti-aliasing, we propose a simple yet effective way to achieve translation-equivariant image quantization by enforcing orthogonality among the codebook embeddings. To explore the advantages of translation-equivariant image quantization, we conduct three proof-of-concept experiments with a carefully controlled dataset: (1) text-to-image generation, where the quantized image indices are the target to predict, (2) image-to-text generation, where the quantized image indices are given as a condition, (3) using a smaller training set to analyze sample efficiency. From the strictly controlled experiments, we empirically verify that the translation-equivariant image quantizer improves not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques