Matten: Video Generation with Mamba-Attention

Yu Gao; Jiancheng Huang; Xiaopeng Sun; Zequn Jie; Yujie Zhong; Lin Ma

arXiv:2405.03025·cs.CV·May 13, 2024·3 cites

Matten: Video Generation with Mamba-Attention

Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

PDF

Open Access

TL;DR

Matten is a novel latent diffusion model with Mamba-Attention architecture that efficiently generates videos by modeling local and global content, achieving competitive performance and scalability.

Contribution

Introduces Matten, a new video generation model combining Mamba-Attention with latent diffusion, offering improved efficiency and scalability over existing Transformer and GAN models.

Findings

01

Achieves superior FVD scores compared to baseline models.

02

Demonstrates scalability with increased model complexity.

03

Maintains competitive performance with minimal computational cost.

Abstract

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsLatent Diffusion Model · Diffusion