Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios

Huafeng Shi; Jianzhong Liang; Rongchang Xie; Xian Wu; Cheng Chen; Chang Liu

arXiv:2505.10584·cs.CV·May 19, 2025

Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios

Huafeng Shi, Jianzhong Liang, Rongchang Xie, Xian Wu, Cheng Chen, Chang Liu

PDF

Open Access

TL;DR

Aquarius introduces a scalable, high-performance family of industry-level video generation models tailored for marketing, enabling efficient, high-fidelity video synthesis across various scenarios with advanced infrastructure and multi-aspect capabilities.

Contribution

The paper presents Aquarius, a comprehensive framework with novel architectures and infrastructure for large-scale, high-quality video generation tailored for industrial marketing applications.

Findings

01

Achieved 36% MFU at large scale with hybrid parallelism.

02

Implemented 2.35x inference speedup using diffusion cache and attention optimization.

03

Supported multi-aspect ratio, multi-resolution, and multi-duration video generation.

Abstract

This report introduces Aquarius, a family of industry-level video generation models for marketing scenarios designed for thousands-xPU clusters and models with hundreds of billions of parameters. Leveraging efficient engineering architecture and algorithmic innovation, Aquarius demonstrates exceptional performance in high-fidelity, multi-aspect-ratio, and long-duration video synthesis. By disclosing the framework's design details, we aim to demystify industrial-scale video generation systems and catalyze advancements in the generative video community. The Aquarius framework consists of five components: Distributed Graph and Video Data Processing Pipeline: Manages tens of thousands of CPUs and thousands of xPUs via automated task distribution, enabling efficient video data processing. Additionally, we are about to open-source the entire data processing framework named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation

MethodsSoftmax · Attention Is All You Need · Inpainting · Diffusion