GenRec: Unifying Video Generation and Recognition with Diffusion Models

Zejia Weng; Xitong Yang; Zhen Xing; Zuxuan Wu; Yu-Gang Jiang

arXiv:2408.15241·cs.CV·November 13, 2024

GenRec: Unifying Video Generation and Recognition with Diffusion Models

Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access 1 Repo 1 Video

TL;DR

GenRec is a unified diffusion-based framework that enhances video recognition and generation by learning generalized spatial-temporal representations, demonstrating robustness and competitive performance on multiple benchmarks.

Contribution

It introduces the first unified model trained with random-frame conditioning to jointly learn video generation and recognition capabilities.

Findings

01

Achieves 75.8% accuracy on SSV2 and 87.2% on K400 for recognition.

02

Sets new state-of-the-art FVD scores for class-conditioned video generation.

03

Demonstrates robustness with limited input frames.

Abstract

Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified framework trained with a random-frame conditioning process so as to learn generalized spatial-temporal representations. The resulting framework can naturally supports generation and recognition, and more importantly is robust even when visual inputs contain limited information. Extensive experiments demonstrate the efficacy of GenRec for both recognition and generation. In particular, GenRec achieves competitive recognition performance, offering 75.8% and 87.2% accuracy on SSV2 and K400,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wengzejia1/genrec
noneOfficial

Videos

GenRec: Unifying Video Generation and Recognition with Diffusion Models· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Data Compression Techniques

MethodsDiffusion