Xp-GAN: Unsupervised Multi-object Controllable Video Generation

Bahman Rouhani; Mohammad Rahmati

arXiv:2111.10233·cs.CV·November 22, 2021

Xp-GAN: Unsupervised Multi-object Controllable Video Generation

Bahman Rouhani, Mohammad Rahmati

PDF

Open Access

TL;DR

Xp-GAN introduces an unsupervised method for controllable video generation, enabling users to manipulate object movements in videos through simple bounding box interactions, with results comparable to current state-of-the-art techniques.

Contribution

The paper presents a novel unsupervised approach that allows explicit control over object motion in video generation using bounding boxes, a feature lacking in prior methods.

Findings

01

Achieves controllable object movement in videos via bounding box manipulation.

02

Uses two Autoencoders to separate motion and content information.

03

Results are comparable to existing state-of-the-art methods.

Abstract

Video Generation is a relatively new and yet popular subject in machine learning due to its vast variety of potential applications and its numerous challenges. Current methods in Video Generation provide the user with little or no control over the exact specification of how the objects in the generate video are to be moved and located at each frame, that is, the user can't explicitly control how each object in the video should move. In this paper we propose a novel method that allows the user to move any number of objects of a single initial frame just by drawing bounding boxes over those objects and then moving those boxes in the desired path. Our model utilizes two Autoencoders to fully decompose the motion and content information in a video and achieves results comparable to well-known baseline and state of the art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition