Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

Jingni Huang; Peter Bloodsworth

arXiv:2604.23532·cs.CV·April 28, 2026

Emotion-Conditioned Short-Horizon Human Pose Forecasting with a Lightweight Predictive World Model

Jingni Huang, Peter Bloodsworth

PDF

TL;DR

This paper explores using facial expression-derived emotion embeddings to improve short-term human pose prediction with a lightweight autoregressive model, showing that emotion signals can enhance prediction accuracy in emotion-driven sequences.

Contribution

It introduces a novel multimodal fusion approach using a gating mechanism within a lightweight predictive world model for emotion-conditioned pose forecasting.

Findings

01

Normalized gating fusion improves prediction accuracy for emotion-driven sequences.

02

Emotion embeddings act as auxiliary signals influencing pose prediction.

03

Counterfactual experiments show trajectory sensitivity to emotion input changes.

Abstract

Short-term human pose prediction plays a crucial role in interactive systems, assistive robots, and emotion-aware human-computer interaction[1-3]. While current trajectory prediction models primarily rely on geometric motion cues, they often overlook the underlying emotional signals influencing human motion dynamics[4-5]. This paper investigates whether facial expression-derived emotion embeddings can provide auxiliary conditional signals for short-term pose prediction. To further evaluate multimodal conditionation in a recursive prediction setting, we propose a lightweight autoregressive predictive world model that performs 15-step rolling pose prediction. This framework combines pose keypoints with emotion embeddings through a learnable gating mechanism and performs autoregressive unfolding prediction using a recurrent sequence model based on a two-layer LSTM architecture. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.