TL;DR
This paper introduces MTC-VAE, a self-supervised model for disentangling motion and content in videos using chunk-wise analysis and a novel reenactment loss, leading to improved video reconstruction and motion transfer.
Contribution
The paper proposes a new chunk-wise VAE approach with a Blind Reenactment Loss for better motion-content disentanglement in videos, outperforming existing methods.
Findings
Outperforms existing methods in disentanglement metrics
Achieves higher quality video reconstructions
Demonstrates effective motion transfer in reenactment tasks
Abstract
Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and content from videos. Unlike previous work on video content-motion disentanglement, we adopt a chunk-wise modeling approach and take advantage of the motion information contained in spatiotemporal neighborhoods. Our model yields independent per-chunk representations that preserve temporal consistency. Hence, we reconstruct whole videos in a single forward-pass. We extend the ELBO's log-likelihood term and include a Blind Reenactment Loss as an inductive bias to leverage motion disentanglement,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsUSD Coin Customer Service Number +1-833-534-1729
