Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Tim Elsner, Paula Usinger, Victor Czech, Gregor Kobsik, Yanjiang He,, Isaak Lim, Leif Kobbelt

TL;DR
This paper introduces a novel quantised autoencoder that learns global basis functions for image representation, improving compression by capturing holistic information beyond local patches.
Contribution
It proposes a spectral-inspired, data-driven approach to learn global basis functions in a VQ-VAE framework, surpassing traditional local patch-based methods.
Findings
Enhanced image compression performance
Global basis functions effectively capture holistic image features
Outperforms local patch-based autoencoders in experiments
Abstract
In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information content in that region. Adaptive discretisation schemes like quadtrees are applied to allocate tokens for patches with varying sizes, but this just varies the region of influence for a token which nevertheless remains a local descriptor. Modern architectures add an attention mechanism to the autoencoder which infuses some degree of global information into the local tokens. Despite the global context, tokens are still associated with a local image region. In contrast, our method is inspired by spectral decompositions which transform an input signal into a superposition of global frequencies. Taking the data-driven perspective, we learn custom basis functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · VQ-VAE
