NCSTR: Node-Centric Decoupled Spatio-Temporal Reasoning for Video-based Human Pose Estimation

Quang Dang Huynh; Xuefei Yin; Andrew Busch; Hugo G. Espinosa; Alan Wee-Chung Liew; Matthew T.O. Worsey; Yanming Zhu

arXiv:2603.20323·cs.CV·March 24, 2026

NCSTR: Node-Centric Decoupled Spatio-Temporal Reasoning for Video-based Human Pose Estimation

Quang Dang Huynh, Xuefei Yin, Andrew Busch, Hugo G. Espinosa, Alan Wee-Chung Liew, Matthew T.O. Worsey, Yanming Zhu

PDF

Open Access

TL;DR

This paper introduces a node-centric framework that explicitly models spatio-temporal and structural information for improved video-based human pose estimation, outperforming existing methods.

Contribution

The paper proposes a novel node-centric approach with a visuo-temporal joint embedding, attention-driven pose-query encoder, and dual-branch spatio-temporal graph for enhanced pose accuracy.

Findings

01

Outperforms state-of-the-art on three video pose benchmarks.

02

Explicit node-centric reasoning improves spatio-temporal modeling.

03

Adaptive fusion of local and global cues enhances joint prediction accuracy.

Abstract

Video-based human pose estimation remains challenged by motion blur, occlusion, and complex spatiotemporal dynamics. Existing methods often rely on heatmaps or implicit spatio-temporal feature aggregation, which limits joint topology expressiveness and weakens cross-frame consistency. To address these problems, we propose a novel node-centric framework that explicitly integrates visual, temporal, and structural reasoning for accurate pose estimation. First, we design a visuo-temporal velocity-based joint embedding that fuses sub-pixel joint cues and inter-frame motion to build appearance- and motion-aware representations. Then, we introduce an attention-driven pose-query encoder, which applies attention over joint-wise heatmaps and frame-wise features to map the joint representations into a pose-aware node space, generating image-conditioned joint-aware node embeddings. Building upon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Human Motion and Animation