Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression
Manikanta Kotthapalli, Banafsheh Rekabdar

TL;DR
This paper introduces a hierarchical vector-quantized autoencoder for low-resolution video compression, enabling efficient storage and transmission with high perceptual quality suitable for edge devices.
Contribution
It extends VQ-VAE-2 to a spatiotemporal setting with a hierarchical latent structure, optimized for low-res video compression on resource-constrained devices.
Findings
Achieves 25.96 dB PSNR and 0.8375 SSIM on UCF101
Improves over baseline by 1.41 dB PSNR
Lightweight model with 18.5M parameters
Abstract
The exponential growth of video traffic has placed increasing demands on bandwidth and storage infrastructure, particularly for content delivery networks (CDNs) and edge devices. While traditional video codecs like H.264 and HEVC achieve high compression ratios, they are designed primarily for pixel-domain reconstruction and lack native support for machine learning-centric latent representations, limiting their integration into deep learning pipelines. In this work, we present a Multi-Scale Vector Quantized Variational Autoencoder (MS-VQ-VAE) designed to generate compact, high-fidelity latent representations of low-resolution video, suitable for efficient storage, transmission, and client-side decoding. Our architecture extends the VQ-VAE-2 framework to a spatiotemporal setting, introducing a two-level hierarchical latent structure built with 3D residual convolutions. The model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies · Image and Video Quality Assessment
