Quantised Global Autoencoder: A Holistic Approach to Representing Visual   Data

Tim Elsner; Paula Usinger; Victor Czech; Gregor Kobsik; Yanjiang He,; Isaak Lim; Leif Kobbelt

arXiv:2407.11913·cs.CV·August 6, 2024

Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data

Tim Elsner, Paula Usinger, Victor Czech, Gregor Kobsik, Yanjiang He,, Isaak Lim, Leif Kobbelt

PDF

Open Access

TL;DR

This paper introduces a novel quantised autoencoder that learns global basis functions for image representation, improving compression by capturing holistic information beyond local patches.

Contribution

It proposes a spectral-inspired, data-driven approach to learn global basis functions in a VQ-VAE framework, surpassing traditional local patch-based methods.

Findings

01

Enhanced image compression performance

02

Global basis functions effectively capture holistic image features

03

Outperforms local patch-based autoencoders in experiments

Abstract

In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information content in that region. Adaptive discretisation schemes like quadtrees are applied to allocate tokens for patches with varying sizes, but this just varies the region of influence for a token which nevertheless remains a local descriptor. Modern architectures add an attention mechanism to the autoencoder which infuses some degree of global information into the local tokens. Despite the global context, tokens are still associated with a local image region. In contrast, our method is inspired by spectral decompositions which transform an input signal into a superposition of global frequencies. Taking the data-driven perspective, we learn custom basis functions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · VQ-VAE