Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios
Huafeng Shi, Jianzhong Liang, Rongchang Xie, Xian Wu, Cheng Chen, Chang Liu

TL;DR
Aquarius introduces a scalable, high-performance family of industry-level video generation models tailored for marketing, enabling efficient, high-fidelity video synthesis across various scenarios with advanced infrastructure and multi-aspect capabilities.
Contribution
The paper presents Aquarius, a comprehensive framework with novel architectures and infrastructure for large-scale, high-quality video generation tailored for industrial marketing applications.
Findings
Achieved 36% MFU at large scale with hybrid parallelism.
Implemented 2.35x inference speedup using diffusion cache and attention optimization.
Supported multi-aspect ratio, multi-resolution, and multi-duration video generation.
Abstract
This report introduces Aquarius, a family of industry-level video generation models for marketing scenarios designed for thousands-xPU clusters and models with hundreds of billions of parameters. Leveraging efficient engineering architecture and algorithmic innovation, Aquarius demonstrates exceptional performance in high-fidelity, multi-aspect-ratio, and long-duration video synthesis. By disclosing the framework's design details, we aim to demystify industrial-scale video generation systems and catalyze advancements in the generative video community. The Aquarius framework consists of five components: Distributed Graph and Video Data Processing Pipeline: Manages tens of thousands of CPUs and thousands of xPUs via automated task distribution, enabling efficient video data processing. Additionally, we are about to open-source the entire data processing framework named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
MethodsSoftmax · Attention Is All You Need · Inpainting · Diffusion
