Towards Robust Direction Invariance in Character Animation
Li-Ke Ma, Zeshi Yang, Baining Guo, KangKang Yin

TL;DR
This paper investigates the challenge of achieving robust direction invariance in character animation, demonstrating the theoretical limitations and proposing a practical remedy that improves learning speed and quality in deep learning methods.
Contribution
It proves the nonexistence of a singularity-free scheme for direction invariance, connects the problem to the hairy ball theorem, and introduces a motion-based remedy to enhance deep learning approaches.
Findings
Robust direction invariant features improve learning speed.
The proposed method enhances final animation quality.
Theoretical proof of the nonexistence of a universal singularity-free scheme.
Abstract
In character animation, direction invariance is a desirable property. That is, a pose facing north and the same pose facing south are considered the same; a character that can walk to the north is expected to be able to walk to the south in a similar style. To achieve such direction invariance, the current practice is to remove the facing direction's rotation around the vertical axis before further processing. Such a scheme, however, is not robust for rotational behaviors in the sagittal plane. In search of a smooth scheme to achieve direction invariance, we prove that in general a singularity free scheme does not exist. We further connect the problem with the hairy ball theorem, which is better-known to the graphics community. Due to the nonexistence of a singularity free scheme, a general solution does not exist and we propose a remedy by using a properly-chosen motion direction that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\JournalSubmission\BibtexOrBiblatex\electronicVersion\PrintedOrElectronic
Towards Robust Direction Invariance in Character Animation
Li-Ke Ma1,2 Zeshi Yang1 Baining Guo3 KangKang Yin1
1Simon Fraser University 2Tsinghua University 3Microsoft Research Asia [email protected]{zeshiy,kkyin}@[email protected]
Abstract
In character animation, direction invariance is a desirable property. That is, a pose facing north and the same pose facing south are considered the same; a character that can walk to the north is expected to be able to walk to the south in a similar style. To achieve such direction invariance, the current practice is to remove the facing direction’s rotation around the vertical axis before further processing. Such a scheme, however, is not robust for rotational behaviors in the sagittal plane. In search of a smooth scheme to achieve direction invariance, we prove that in general a singularity free scheme does not exist. We further connect the problem with the hairy ball theorem, which is better-known to the graphics community. Due to the nonexistence of a singularity free scheme, a general solution does not exist and we propose a remedy by using a properly-chosen motion direction that can avoid singularities for specific motions at hand. We perform comparative studies using two deep-learning based methods, one builds kinematic motion representations and the other learns physics-based controls. The results show that with our robust direction invariant features, both methods can achieve better results in terms of learning speed and/or final quality. We hope this paper can not only boost performance for character animation methods, but also help related communities currently not fully aware of the direction invariance problem to achieve more robust results.
{CCSXML}
<ccs2012> <concept> <concept_id>10010147.10010371.10010352</concept_id> <concept_desc>Computing methodologies Animation</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10002950.10003741.10003742.10003744</concept_id> <concept_desc>Mathematics of computing Algebraic topology</concept_desc> <concept_significance>300</concept_significance> </concept> </ccs2012>\ccsdesc
[500]Computing methodologies Animation \ccsdesc[300]Mathematics of computing Algebraic topology
\printccsdesc
††volume: 38††issue: 7
1 Introduction
In computer animation, digital characters are usually parameterized by an articulated rigid body system with six Degrees of Freedom (DoFs) at the root, and a tree of internal rotational joints. It is relatively well-known, however, that the six DoFs at the root, although necessary to fully specify a 3D pose for the character, are redundant in terms of specifying the motion tasks. For example, a walk to the north and a walk in the same style to the south are considered the same walk even though the directions of motion are different. It is thus a common practice to remove the root rotation about the gravitational direction, usually the -axis in computer graphics, before further processing such as pose similarity computation in motion retrieval or control learning in physics-based animation.
To remove the redundant -rotational DoF at the root, the current practice is to define a “facing direction” by the axis of a local frame defined at the root, i.e., the sagittal axis, as shown in Figure 1 left. This facing direction usually corresponds to where the character’s belly button points towards. By aligning the facing directions of all frames to the same plane, poses and motion features such as positions and velocities become -rotation free and can then be compared or used in a more consistent way [KG04, PALvdP18].
The above defined facing direction works well for locomotion tasks, which have been the focus of most character animation research so far. There are cases, however, that the facing direction as defined above is problematic. Figure 1 right shows a backflip motion during which the root axis aligns with the global axis twice, around the moments denoted by the vertical dashed lines. The computed rotations of the root frame will thus encounter singularities around these moments, as will be described in detail in Section 3. The motion features after being transformed by these derived -rotations will change wildly around these moments as well. Such ill-behaved features can potentially degrade the performance of various learning algorithms to certain degrees, as will be shown in Section 5. We therefore wish to devise a robust scheme that removes the direction of motion in a singularity free manner.
We formulate this problem using mathematical tools from algebraic topology in Section 3. Then we prove that such a desired singularity-free smooth mapping scheme actually does not exist in general. We therefore propose a remedy to choose a well-behaved “motion direction” to replace the “facing direction” for specific motions at hand. The chosen motion direction should stay away from the global axis during the full course of the motion. For example, for the backflip shown in Figure 1, the root axis qualifies as such a motion direction. We then formulate the calculation of direction-invariant features based on a given motion direction, either manually chosen or automatically computed, in Section 4. In Section 5, we validate that a well-chosen motion direction and its derived features are sufficient to avoid singularities in practice. We also perform comparative studies using two state-of-the-art animation methods on direction-invariant features calculated from the motion direction (DIM), direction-invariant features derived from the conventional facing direction (DIF), and global features (GF). Our results show that DIF features help achieve better performance in terms of synthesized motion quality and/or learning speed.
With recent advances and growths in machine learning and computer vision, we have seen publications related to character animation in non-graphics communities [ZAv17, ZLX*∗*18]. These communities seem to handle motion directions in many different ways. We show that by using our robust direction-invariant features, some of these algorithms can greatly boost their performance without any change to their training or learning components. We hope that our solution and comparative studies can help advocate robust direction-invariant features to wider communities to facilitate faster scientific advances in more related fields.
2 Related Works
2.1 Character Animation
Kinematic character animation has a long history of comparing poses [KGP02] and modelling motions [HKS17, ACOH*∗*18] in a direction invariant manner. It is known that “a motion is defined only up to a rigid 2D coordinate transformation. That is, the motion is fundamentally unchanged if we translate it along the floor plane or rotate it about the vertical axis” [KGP02]. The rotation about the vertical axis can be calculated from sampled point clouds on the digital character [KGP02], or simply from the facing direction extracted from the root orientations [HSK16, HKS17]. In this work we also use root orientations, but our framework can be applied to other schemes as well.
Various physics-based animation methods also need to handle the motion direction. In hand-crafted controllers such as [HWBO95, YLvdP07, CBvdP10], features are transformed into the sagittal plane and the lateral plane before applying controls in each plane respectively, so that the simulated character can walk in any direction. In [KH17], when the reference motion involves a high speed rotation during the flight phase, the forward facing direction is obtained by linearly interpolating the initial and final forward facing directions to achieve more robust reference coordinate systems. Trajectory optimization methods usually optimize for a specific 3D trajectory with no direction invariance [WK88, MTP12, ABdLH13, LK14]. Recently deep learning and deep reinforcement learning (DRL) have been applied successfully on learning physics-based controls for character animation [PALvdP18, LL18, YTL18, SLL19]. Features are usually converted to be facing-direction invariant before being passed to the deep neural networks (DNNs). We chose DeepMimic [PALvdP18] as one of the two methods for our comparative study in Section 5.1, as it produces state-of-the-art results in terms of motion variety and quality, and its source code was released by the original authors and readily available.
2.2 Related Fields
The computer vision and machine learning communities have been investigating motion prediction and synthesis for some time [WFH07, THR07, FLFM15, JZSS16, MBR17, GWLM18, ZLX*∗*18]. Among these publications, some avoided using root orientations at all [GWLM18, ZLX*∗*18]; some used incremental changes around the gravitational vertical, and absolute pitch and roll relative to the facing direction [THR07, MBR17]; some just used root orientations as is [WFH07]; and some are not clear from the writing how they dealt with the root orientations [FLFM15, JZSS16]. We chose acRNN [ZLX*∗*18] as one of the two methods for our comparative study in Section 5.1, as it generates the best results in terms of long-term motion quality and complexity, and its source code was released by the original authors and readily available.
Human action recognition is also a classic problem in computer vision [LGN14, SZ14, ZAv17]. Most work directly deals with pixel features. A few choose to first reconstruct skeletal poses from video, and then use features in the global frame for further learning [ZAv17]. We think such recognition tasks would also benefit from using our robust direction-invariant features as well.
3 Nonexistence of a Singularity-free Mapping Scheme
We use to denote an orientation in , and to denote composition of two orientations and . Our derivations and proofs are independent of the specific rotation parameterization method. For example, can be parameterized by quaternions and rotation composition can be achieved through quaternion multiplication. We use to denote a vector rotated by .
3.1 Problem Formulation
We consider two orientations in equivalent if they are connected by a rotation with respect to the gravity direction, namely a rotation about . We denote the set of all as , which is isomorphic to . We also denote the orbit of an orientation under rotations in as :
[TABLE]
Rotations in the same orbit are considered equivalent. Denote the set of orbits as . Then is homeomorphic to the 2-sphere with a natural homogeneous space structure as the quotient of two Lie groups and :
[TABLE]
Calculating a -rotation invariant frame for an orientation requires projecting to a representative element of the orbit, so that all orientations in are aligned with . is chosen manually, such as the one described in Section 4. Formally speaking, we wish to find a mapping scheme from all orbits in to representative elements in subject to certain constraints:
[TABLE]
The question is whether there is a smooth singularity-free mapping that satisfies the above property. We provide two proofs that such singularity-free mapping does not exist. One proof follows the Fiber Bundle theory [Hat02]. The other connects to the Hairy Ball Theorem [EG79], which is more familiar to the graphics community [FSMD07].
3.2 Proof 1: by Fiber Bundle Theory
Our orbit operator defines a fiber bundle . In the quest of a smooth mapping , we wish to find a globally defined cross section of this fiber bundle. However,
[TABLE]
since SO(3) is homeomorphic to the 3-dimensional projective space, which is not homeomorphic to . Thus there does not exist such a global cross section [Hat02].
3.3 Proof 2: by Hairy Ball Theorem
Imagine a unit tangent vector attached to the north pole of a unit sphere. Now we define a mapping that maps an element to a unit tangent vector located at on the sphere, by rotating this sphere by .
Lemma 3.1
The mapping between and unit tangent vectors on the sphere is bijective.
Proof 3.2**.**
(a) Each element in rotates to a unique unit tangent vector on the sphere, so the mapping is injective. (b) For any unit tangent vector on the sphere, there is always a that can rotate to , so the mapping is surjective. Thus the mapping is bijective.
Lemma 3.3**.**
* is a bijective mapping between and the unit tangent vector space at .*
Proof 3.4**.**
(a) , sends to a fixed point on the sphere, and to a unit tangent vector at this point, because
[TABLE]
(b) Any unit tangent vector at corresponds to a rotation in . This is because for any unit tangent vector at , corresponds to a rotation due to the bijection property of that we just proved in Lemma 3.1, and also sends to . Now we take a look at the point :
[TABLE]
* is invariant under the rotation . We know that only can send north pole to itself, thus , and therefore .*
Theorem 3.5**.**
Under the mapping , a solution of Equation 3 corresponds to a unit vector field on the sphere.
Proof 3.6**.**
Since there is a bijection between and unit tangent vectors at according to Lemma 3.3, selecting a representative element from each orbit , as the mapping in Equation 3 does, is equivalent to selecting one unit tangent vector for each point on the sphere. Therefore corresponds to a unit tangent vector field on the sphere. Hereafter we denote this corresponding vector field as .
This is exactly what the Hairy Ball Theorem concerns [EG79]: you simply cannot comb a sphere without creating at least one cowlick somewhere. That is, there is no singularity-free unit tangent vector field on a sphere, which is relatively well-known in the graphics community [FSMD07]. Therefore, there is no singularity-free mapping that satisfies Equation 3.
4 Robust Direction-invariant Features
In order to map all motion features into a direction invariant set, we need to first calculate the -rotation of the root frame , so that the transformed features become invariant to the direction of motion. We formulate one of the typical algorithms used in the animation community for this purpose in Section 4.2, but generalize it by replacing the facing direction with motion direction. The motion direction can be either manually selected for simple rotational tasks, or automatically computed as described in Section 4.1.
4.1 Motion Direction Computation
Following the proof in Section 3.3, we derive an algorithm as shown in Algorithm 1 to automatically choose the best motion direction for a given root orientation trajectory . The idea is to maximize the geodesic distance from to and to stay away from singularities for the full course of the motion, which is equivalent to maximizing the geodesic distance from to and . We will also use the trajectory and the vector field in our visualizations in Figure 3 and 6.
4.2 Direction-invariant Features
Assuming the current root frame orientation represented in the global world frame is . Its -rotation induced by motion direction can be derived as follows as illustrated in Figure 2. First, rotate the motion direction by to get . Second, calculate the dihedral angle between the two planes with normals and . Then equals , which is a rotation that rotates onto the plane and parameterized in the axis-angle representation. Finally, , and all orientations in orbit are mapped to the same representative element . The that corresponds to the mapping induced from a motion direction can then be calculated as for point on the sphere. Note that we only need and not to transform a motion feature into its direction invariant version . In our experiments in Section 5, motion direction is chosen as follows: by default we choose the original facing direction axis; if it does not work, we choose the lateral axis; when both fail, we compute another axis as described in Algorithm 1.
As we have proved in Section 3.3, one direction-invariant mapping scheme corresponds to one unique unit vector field on the sphere. We thus visualize two mapping schemes in Figure 3. Figure 3 (a) is induced by choosing the conventional facing direction (root axis) as the motion direction. Figure 3 (b) is induced by choosing the lateral direction (root axis) as the motion direction. We also visualize the trajectory of root frames by plotting on the unit sphere. The singularity points are calculated by Helmholtz decomposition [KC13] from the vector field . As we can see, different mapping schemes put singularity points onto different locations on the sphere, while the trajectories of root frame orientations remain the same. Thus we can strategically avoid singularities by choosing an appropriate motion direction for a specific motion task.
5 Results
We perform comparative studies on motion synthesis tasks using one kinematic method [ZLX*∗*18] and one physics-based approach [PALvdP18]. In both studies, we compare performance using global features (GF), direction-invariant features calculated from the facing direction (DIF), and direction-invariant features calculated from the motion direction (DIM). Note that DIF is a special case of DIM. So we first compare GF vs. DIF, and if DIF performs better than GF, DIM will too. Then we compare DIF vs. DIM for those motions that DIF is problematic. In the paper, we report the performance in terms of motion quality and learning speed. We encourage the readers to also watch the supplemental video demo to better judge the differences visually. We use the same color scheme for all the figures and video clips: the green character shows the ground truth captured reference motion; the red character visualizes results from GF; the blue character shows motions synthesized from DIF; and the yellow character presents motions from DIM.
5.1 Kinematic Character Animation
We chose acRNN [ZLX*∗*18] as the kinematic motion synthesis method to perform the comparative study, as it is the only one that can synthesize long-term and complex motions, to the best of our knowledge. Trained acRNN (auto-conditioned Recurrent Neural Net) can generate long motions of style similar to those in the training set given an input short motion clip. We use the code released by the original authors and all training settings and parameters remain the same, except for the input motion features which we use three different sets: GF, DIF, and DIM. The features used in the original paper, including the root velocity and relative positions of internal joints wrt. the root, are represented in the global world frame and thus used as the GF set. We further convert all features to the DIF and DIM sets, with an addition of the root’s angular velocity around the axis to generate the full motion trajectory by integration. All models with different input feature sets are trained for the same amount of time. We use the publicly available CMU motion capture database and downsample the selected motions to 60 Hz as our training data.
5.1.1 GF vs. DIF
To compare models with GF vs. DIF, we choose the original Indian Dance Motion Sets used in the original paper [ZLX*∗*18]. This dataset contains roughly 421 seconds of training motion, and we trained both models for 50 hours. Figure 4 shows several snapshots taken from synthesized motions generated by the trained models. The model trained with rotation-invariant features DIF can generate significantly longer motions in higher qualities, in contrast to motions generated by GF, which tend to become unrealistic after a few seconds. The rotation-invariant feature representation greatly reduces the complexity of the models that need to be learned.
5.1.2 DIF vs. DIM
To compare models learned with DIF vs. DIM features, we prepared a dataset of about 203 seconds which consists of various rotational behaviors such as break dance and acrobatics motions. We trained both models for 40 hours. Figure 5 shows snapshots of synthesized motions. The model trained with DIF fails to generate some rotational motions, while the DIM model using the root axis as the motion direction can successfully reproduce these behaviors.
We further note that when the facing direction root axis does not work as a valid motion direction, the next direction we try is the lateral direction root axis. However, there are rotational behaviors that neither direction qualifies as a good motion direction. Figure 6 shows one example with a spin dance motion. Features derived from both the axis and the axis are bad. We thus use Algorithm 1 to automatically compute a robust motion direction for this motion: . As shown at the top of Figure 6, features derived from this axis are much smoother. In the supplementary video, we also visualize all the transformed positional features, and those derived from the axis and axis will cause the character to rotate wildly.
5.2 Physics-based Character Animation
We chose DeepMimic [PALvdP18] as the physics-based motion synthesis method to perform the comparative study, as it produces the best results in terms of motion quality and motion variety. DeepMimic combines a motion-imitation objective with a task objective, and train characters using DRL to, say, walk in a desired direction. The input features are of high dimensions, and we only alter the state that describes the configuration of the character’s body, including relative positions of each link with respect to the root, their rotations expressed in quaternions, and their linear and angular velocities. We refer interested readers to the original paper for more details [PALvdP18]. We run the code released by the original authors on a machine with an 18-core i9-7980XE CPU, and can draw about five million samples per hour. We keep all other training parameters the same. The training motion capture clips are either from the data downloaded together with the code, or selected from the CMU mocap database.
5.2.1 GF vs. DIF
To compare controllers learned from GF vs. DIF, we used two cyclic motion tasks that slowly turn in the plane: a walk (8.8s) and a backflip (10.6s). The original DeepMimic code used features in the facing frame, i.e., the DIF scheme. We further tested features in the global frame GF. As there is some randomness in deep reinforcement learning, we trained each task three times with different random seeds, and evaluated the policy every 100th iteration as in DeepMimic. Figure 7 and 8 show the training curves for the walk and backflip, respectively. Solid curves correspond to mean returns and shaded regions correspond to minimal and maximal returns among the three trials. Using GF largely slows down the training and results in failure in both tasks, i.e., the character loses balance and falls to the ground. In contrast, the DIF features facilitated learning of direction-invariant locomotion controllers. We also encourage readers to refer to our supplement video to see the animated results. Note that there is little difference in training of a short straight walk though, for example. In such cases the controller does not need to be invariant to different facing directions, and both schemes can be successful.
5.2.2 DIF vs. DIM
To compare DIF vs. DIM, we tested rotational motions such as crawl, roll and backflip. We first plot and compare the DIF and DIM features for a backflip in Figure 9. There are severe discontinuities in the motion features in the DIF scheme. We also trained DRL policies with DIF and DIM features. The DIM motion direction is chosen to be the root axis in all test cases. Learning with DIM features results in faster training and better motion quality as shown in Figure 10. In these tasks, the mapped root frame crossed the singularities on the unit sphere in the DIF scheme (e.g., Figure 2), while the DIM scheme successfully avoided them. Note that for certain cases, such as backflip, learning with DIF and DIM features presented little difference. We conjecture that the controller is easy to be successful at the moment of singularity crossing, due to the inertia of motion in the midair.
6 Discussion
To conclude, we have proved that in general there is no singularity-free scheme to achieve direction invariance in character animation. However, a properly-chosen motion direction and its derived features can stay far away from singularities for specific motions and work well in practice. The traditionally-chosen facing direction has worked so far because locomotion tasks, which have been the focus of character animation research, generally only involve rotations around the gravitational direction. Our robust direction invariant features derived from motion directions will enable better results for modelling and synthesis of more versatile behaviors such as gymnastic and dance motions. A good motion direction can be either selected manually, or computed automatically for complex rotational behaviors that contain rotations around multiple axes.
We note that even though kinematic character controllers are usually facing direction invariant today, most physics-based controllers are not. For example in DeepMimic, the learned controller tries to track the absolute root orientation in the reference motion. If the reference character faces north and walks north, the simulated character will always try to face north. Even if the simulated character is initially positioned facing south, it will still try to turn north and walk north. This will result in either a fall or an awkward sharp turn to the north. If this global orientation constraint could be lifted in the DRL learning, we would expect greater performance gaps among GF, DIF and DIM schemes. Optimization-based control methods such as trajectory optimization also have mainly worked on direction-specific controllers. We expect that our DIM features will achieve performance gains over DIF as well for optimization-based methods, as they usually need to calculate derivatives, and DIM features are simply smoother and more well-behaved than DIF. Lastly, we note that although transformed rotations in are smooth using our DIM scheme, widely used rotation representations such as quaternions can still have discontinuities and pose difficulty for learning. We recommend using continuous rotation representations such as the ones recently proposed in [ZBJ*∗*19].
In the future, we wish to examine how to achieve direction invariance for end-to-end computer vision systems such as for human activity recognition. In such systems, only pixel data and/or 2D positions are available as input features, and the DNN needs to learn direction invariance somehow. 3D direction-invariant features will definitely accelerate this process, but it is not obvious how to derive such features from 2D inputs without another layer of 2D to 3D pose lifting [TRA17].
Acknowledgements
We would like to thank Runjie Hu for insightful discussions during the development of our proofs in Section 3, and Yiying Tong for proof reading these proofs. This project is partially supported by NSERC Discovery Grants Program RGPIN-06797 and RGPAS-522723. Li-Ke Ma is supported by a CSC scholarship from the China Scholarship Council.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[A Bd LH 13] Al Borno M., de Lasa M., Hertzmann A. : Trajectory optimization for full-body movements with complex contacts. TVCG 19 , 8 (2013), 1405–1414.
- 2[ACOH ∗ 18] Aristidou A., Cohen-Or D., Hodgins J. K., Chrysanthou Y., Shamir A. : Deep motifs and motion signatures. ACM Trans. Graph. 37 , 6 (Dec. 2018), 187:1–187:13.
- 3[C Bvd P 10] Coros S., Beaudoin P., van de Panne M. : Generalized biped walking control. ACM Trans. Graph. 29 , 4 (2010), Article 130.
- 4[EG 79] Eisenberg M., Guy R. : A proof of the hairy ball theorem. The American Mathematical Monthly 86 , 7 (1979), 571–574.
- 5[FLFM 15] Fragkiadaki K., Levine S., Felsen P., Malik J. : Recurrent network models for human dynamics. In ICCV (2015), pp. 4346–4354.
- 6[FSMD 07] Fisher M., Schröder P., Mathieu Desbrun H. H. : Design of tangent vector fields. ACM Trans. Graph 26 (2007), Article No. 56.
- 7[GWLM 18] Gui L.-Y., Wang Y.-X., Liang X., Moura J. M. : Adversarial geometry-aware human motion prediction. In ECCV (2018), pp. 786–803.
- 8[Hat 02] Hatcher A. : Algebraic Topology . Cambridge University Press, 2002.
