SVS-GAN: Leveraging GANs for Semantic Video Synthesis

Khaled M. Seyam; Julian Wiederer; Markus Braun; Bin Yang

arXiv:2409.06074·cs.CV·September 11, 2024

SVS-GAN: Leveraging GANs for Semantic Video Synthesis

Khaled M. Seyam, Julian Wiederer, Markus Braun, Bin Yang

PDF

Open Access

TL;DR

This paper introduces SVS-GAN, a specialized framework for semantic video synthesis that uses a novel architecture and loss functions to generate temporally coherent video sequences from semantic maps, outperforming existing models.

Contribution

The paper presents SVS-GAN, a new architecture with a triple-pyramid generator and semantic segmentation-based discriminator for improved semantic video synthesis.

Findings

01

Outperforms state-of-the-art models on Cityscapes and KITTI-360 datasets.

02

Introduces a tailored architecture and loss functions for SVS.

03

Achieves more temporally coherent and realistic video sequences.

Abstract

In recent years, there has been a growing interest in Semantic Image Synthesis (SIS) through the use of Generative Adversarial Networks (GANs) and diffusion models. This field has seen innovations such as the implementation of specialized loss functions tailored for this task, diverging from the more general approaches in Image-to-Image (I2I) translation. While the concept of Semantic Video Synthesis (SVS) $\unicode x 2013$ the generation of temporally coherent, realistic sequences of images from semantic maps $\unicode x 2013$ is newly formalized in this paper, some existing methods have already explored aspects of this field. Most of these approaches rely on generic loss functions designed for video-to-video translation or require additional data to achieve temporal coherence. In this paper, we introduce the SVS-GAN, a framework specifically designed for SVS, featuring a custom…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsDiffusion · OASIS · Spatially-Adaptive Normalization