PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition
Sanghyeon Lee, Jinwoo Kim, and Jong Taek Lee

TL;DR
PoseBridge introduces an HPE-aware framework for zero-shot skeleton-based action recognition, enhancing semantic understanding by bridging intermediate pose representations to improve classification accuracy.
Contribution
It proposes a novel method that leverages intermediate human pose estimation features to improve zero-shot skeleton-based action recognition performance.
Findings
Significant performance improvements on NTU-RGB+D, PKU-MMD, and Kinetics benchmarks.
Achieves 13.3-17.4 point gains on Kinetics-200/400 PURLS benchmark.
Demonstrates the effectiveness of HPE-aware semantic cues in zero-shot recognition.
Abstract
Zero-shot skeleton-based action recognition (ZSSAR) is typically treated as a skeleton-text alignment problem: encode joint-coordinate sequences, align them with language, and classify unseen actions. We argue that this alignment is often too late. Skeletons are not complete action observations, but compressed outputs of human pose estimation (HPE); by the time alignment begins, human-object interactions and pose-relative visual cues may no longer be explicit. We call this upstream semantic loss. To address it, we propose PoseBridge, an HPE-aware ZSSAR framework that bridges intermediate HPE representations to skeleton-text alignment. Rather than adding an RGB action branch or object detector, PoseBridge extracts pose-anchored semantic cues from the same HPE process that produces skeletons, then transfers them through skeleton-conditioned bridging and semantic prototype adaptation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
