LIA-X: Interpretable Latent Portrait Animator

Yaohui Wang; Di Yang; Xinyuan Chen; Francois Bremond; Yu Qiao; Antitza Dantcheva

arXiv:2508.09959·cs.CV·August 14, 2025

LIA-X: Interpretable Latent Portrait Animator

Yaohui Wang, Di Yang, Xinyuan Chen, Francois Bremond, Yu Qiao, Antitza Dantcheva

PDF

1 Models 3 Reviews

TL;DR

LIA-X is an interpretable portrait animation framework that enables fine-grained control over facial dynamics transfer through a novel sparse motion dictionary, outperforming previous methods in various benchmarks.

Contribution

LIA-X introduces a linear motion code navigation and a sparse motion dictionary for interpretable, controllable facial reenactment, scalable to large models with extensive datasets.

Findings

01

Outperforms previous methods in self- and cross-reenactment tasks

02

Supports fine-grained, user-guided editing and 3D-aware video manipulation

03

Scalable to models with approximately 1 billion parameters

Abstract

We introduce LIA-X, a novel interpretable portrait animator designed to transfer facial dynamics from a driving video to a source portrait with fine-grained control. LIA-X is an autoencoder that models motion transfer as a linear navigation of motion codes in latent space. Crucially, it incorporates a novel Sparse Motion Dictionary that enables the model to disentangle facial dynamics into interpretable factors. Deviating from previous 'warp-render' approaches, the interpretability of the Sparse Motion Dictionary allows LIA-X to support a highly controllable 'edit-warp-render' strategy, enabling precise manipulation of fine-grained facial semantics in the source portrait. This helps to narrow initial differences with the driving video in terms of pose and expression. Moreover, we demonstrate the scalability of LIA-X by successfully training a large-scale model with approximately 1…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

Novelty & Interpretability – Introducing sparsity in the motion dictionary is a simple yet effective idea that yields human-interpretable motion vectors (e.g., yaw, pitch, smile). Controllability – The edit-warp-render approach is well-motivated and addresses pose/expression misalignment issues common in talking-head animation.

Weaknesses

Limited Theoretical Depth – The sparse dictionary idea, though effective, is conceptually straightforward (essentially regularization on motion coefficients). The paper lacks deeper analysis of why sparsity leads to disentanglement. Evaluation Diversity – While results on self- and cross-reenactment are strong, the experiments are limited to face animation datasets; no user study or generalization to non-portrait domains is shown. Scalability Ceiling – Gains saturate beyond 0.3B parameters (Ta

Reviewer 02Rating 4Confidence 4

Strengths

The Sparse Motion Dictionary introduces an interpretable regularization mechanism within the LIA framework. Although sparsity is a classical idea, applying it to disentangle latent motion vectors for human-readable semantics is a novel interpretability-driven adaptation. The edit-warp-render paradigm offers a conceptually clean way to integrate user control into a previously self-supervised latent animation pipeline.

Weaknesses

1. The overall architecture (Encoder–Flow–Renderer) and loss formulation are largely inherited from LIA. The paper provides no theoretical justification or analytical evidence explaining why sparsity should induce semantic disentanglement in the motion dictionary. As a result, the contribution is primarily incremental. 2. The claimed “controllability” arises randomly from training statistics and is not guaranteed or reproducible across runs or identities. 3. The model lacks mechanisms (such as m

Reviewer 03Rating 4Confidence 4

Strengths

1. **Novel Synthesis of Techniques:** The paper presents an elegant integration of two powerful and established methods: GAN and sparse dictionary learning, applying them effectively to the portrait animation domain. 2. **Demonstrated Scalability:** The experimental results successfully demonstrate the architecture's capacity to scale to high-resolution outputs and handle complex animation tasks.

Weaknesses

**Insufficient Ablation of the Sparse Motion Dictionary:** The sparse motion dictionary is central to the paper's contributions, yet it is not sufficiently validated through detailed ablation studies. The specific benefits of the dictionary and its sparsity constraint are therefore not fully quantified.

Code & Models

Models

🤗
YaohuiW/LIA-X
model· ♡ 13
♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.