Causal-Inspired Multitask Learning for Video-Based Human Pose Estimation
Haipeng Chen, Sifan Wu, Zhigang Wang, Yifang Yin, Yingying Jiao,, Yingda Lyu, Zhenguang Liu

TL;DR
This paper introduces a causal-inspired multitask learning framework for video-based human pose estimation, enhancing robustness and interpretability by modeling causal relationships and prioritizing keypoint-relevant features.
Contribution
It pioneers a causal perspective in pose estimation, integrating auxiliary tasks for causal reasoning and a token importance module for improved interpretability and performance.
Findings
Outperforms state-of-the-art on three benchmark datasets
Enhances model robustness to challenging scenes
Improves interpretability by identifying causal tokens
Abstract
Video-based human pose estimation has long been a fundamental yet challenging problem in computer vision. Previous studies focus on spatio-temporal modeling through the enhancement of architecture design and optimization strategies. However, they overlook the causal relationships in the joints, leading to models that may be overly tailored and thus estimate poorly to challenging scenes. Therefore, adequate causal reasoning capability, coupled with good interpretability of model, are both indispensable and prerequisite for achieving reliable results. In this paper, we pioneer a causal perspective on pose estimation and introduce a causal-inspired multitask learning framework, consisting of two stages. \textit{In the first stage}, we try to endow the model with causal spatio-temporal modeling ability by introducing two self-supervision auxiliary tasks. Specifically, these auxiliary tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsFocus
