FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

Mingzhi Sheng; Zekai Gu; Peng Li; Cheng Lin; Hao-Xiang Guo; Ying-Cong Chen; Yuan Liu

arXiv:2602.13185·cs.CV·February 16, 2026

FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

Mingzhi Sheng, Zekai Gu, Peng Li, Cheng Lin, Hao-Xiang Guo, Ying-Cong Chen, Yuan Liu

PDF

Open Access 1 Models

TL;DR

FlexAM introduces a novel 3D control signal for disentangling appearance and motion in video generation, enabling versatile editing and control with improved performance across multiple tasks.

Contribution

The paper presents FlexAM, a unified framework with a 3D control signal that effectively disentangles appearance and motion for versatile video generation control.

Findings

01

Achieves superior performance in video editing tasks

02

Effectively disentangles appearance and motion

03

Supports diverse control tasks like camera and object editing

Abstract

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SandwichZ/Wan2.2-Fun-5B-FLEXAM
model· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation