# Learning Character-Agnostic Motion for Motion Retargeting in 2D

**Authors:** Kfir Aberman, Rundi Wu, Dani Lischinski, Baoquan Chen, Daniel Cohen-Or

arXiv: 1905.01680 · 2019-05-13

## TL;DR

This paper introduces a novel deep learning approach for 2D video-based human motion retargeting that bypasses 3D reconstruction, enabling robust transfer of motion between different performers and camera views.

## Contribution

It presents a method to extract a high-level, skeleton- and camera-invariant motion representation directly from videos, improving retargeting accuracy without explicit 3D pose estimation.

## Key findings

- Outperforms existing retargeting methods on in-the-wild videos
- Enables applications like performance cloning and video-driven cartoons
- Successfully extracts motion without 3D reconstruction

## Abstract

Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters. In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset. We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01680/full.md

## Figures

40 figures with captions in the complete paper: https://tomesphere.com/paper/1905.01680/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1905.01680/full.md

---
Source: https://tomesphere.com/paper/1905.01680