Turbo-VAED: Fast and Stable Transfer of Video-VAEs to Mobile Devices
Ya Zou, Jingfeng Yao, Siyuan Yu, Shuai Zhang, Wenyu Liu, Xinggang Wang

TL;DR
This paper introduces Turbo-VAED, a novel, efficient VAE decoder for mobile devices that enables real-time 720p video decoding by reducing parameters and optimizing upsampling, with minimal quality loss and significant speed improvements.
Contribution
We propose a universal mobile-oriented VAE decoder, Turbo-VAED, with a new architecture and training method that significantly accelerates video VAE inference on mobile devices.
Findings
Achieves real-time 720p video decoding on mobile devices.
Reduces model parameters by up to 82.5%.
Speeds up VAE inference by up to 84.5x on GPUs.
Abstract
There is a growing demand for deploying large generative AI models on mobile devices. For recent popular video generative models, however, the Variational AutoEncoder (VAE) represents one of the major computational bottlenecks. Both large parameter sizes and mismatched kernels cause out-of-memory errors or extremely slow inference on mobile devices. To address this, we propose a low-cost solution that efficiently transfers widely used video VAEs to mobile devices. (1) We analyze redundancy in existing VAE architectures and get empirical design insights. By integrating 3D depthwise separable convolutions into our model, we significantly reduce the number of parameters. (2) We observe that the upsampling techniques in mainstream video VAEs are poorly suited to mobile hardware and form the main bottleneck. In response, we propose a decoupled 3D pixel shuffle scheme that slashes end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Video Coding and Compression Technologies
