Kling-MotionControl Technical Report

Kling Team: Jialu Chen; Yikang Ding; Zhixue Fang; Kun Gai; Kang He; Xu He; Jingyun Hua; Mingming Lao; Xiaohan Li; Hui Liu; Jiwen Liu; Xiaoqiang Liu; Fan Shi; Xiaoyu Shi; Peiqin Sun; Songlin Tang; Pengfei Wan; Tiancheng Wen; Zhiyong Wu; Haoxian Zhang; Runze Zhao; Yuanxing Zhang; Yan Zhou

arXiv:2603.03160·cs.CV·March 4, 2026

Kling-MotionControl Technical Report

Kling Team: Jialu Chen, Yikang Ding, Zhixue Fang, Kun Gai, Kang He, Xu He, Jingyun Hua, Mingming Lao, Xiaohan Li, Hui Liu, Jiwen Liu, Xiaoqiang Liu, Fan Shi, Xiaoyu Shi, Peiqin Sun, Songlin Tang, Pengfei Wan, Tiancheng Wen, Zhiyong Wu, Haoxian Zhang, Runze Zhao, Yuanxing Zhang

PDF

Open Access

TL;DR

Kling-MotionControl is a comprehensive framework that enables high-fidelity, controllable, and expressive character animation with robust cross-identity generalization and accelerated inference, surpassing existing solutions in quality and flexibility.

Contribution

The paper introduces Kling-MotionControl, a novel DiT-based system that integrates heterogeneous motion representations, adaptive identity-agnostic learning, and multi-stage distillation for efficient, expressive, and generalizable character animation.

Findings

01

Outperforms leading commercial and open-source solutions in fidelity and control.

02

Achieves over 10x faster inference speed through multi-stage distillation.

03

Demonstrates robust generalization across diverse character types and styles.

Abstract

Character animation aims to generate lifelike videos by transferring motion dynamics from a driving video to a reference image. Recent strides in generative models have paved the way for high-fidelity character animation. In this work, we present Kling-MotionControl, a unified DiT-based framework engineered specifically for robust, precise, and expressive holistic character animation. Leveraging a divide-and-conquer strategy within a cohesive system, the model orchestrates heterogeneous motion representations tailored to the distinct characteristics of body, face, and hands, effectively reconciling large-scale structural stability with fine-grained articulatory expressiveness. To ensure robust cross-identity generalization, we incorporate adaptive identity-agnostic learning, facilitating natural motion retargeting for diverse characters ranging from realistic humans to stylized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis