PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

Sanghyeon Lee; Jinwoo Kim; and Jong Taek Lee

arXiv:2605.11497·cs.CV·May 13, 2026

PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

Sanghyeon Lee, Jinwoo Kim, and Jong Taek Lee

PDF

TL;DR

PoseBridge introduces an HPE-aware framework for zero-shot skeleton-based action recognition, enhancing semantic understanding by bridging intermediate pose representations to improve classification accuracy.

Contribution

It proposes a novel method that leverages intermediate human pose estimation features to improve zero-shot skeleton-based action recognition performance.

Findings

01

Significant performance improvements on NTU-RGB+D, PKU-MMD, and Kinetics benchmarks.

02

Achieves 13.3-17.4 point gains on Kinetics-200/400 PURLS benchmark.

03

Demonstrates the effectiveness of HPE-aware semantic cues in zero-shot recognition.

Abstract

Zero-shot skeleton-based action recognition (ZSSAR) is typically treated as a skeleton-text alignment problem: encode joint-coordinate sequences, align them with language, and classify unseen actions. We argue that this alignment is often too late. Skeletons are not complete action observations, but compressed outputs of human pose estimation (HPE); by the time alignment begins, human-object interactions and pose-relative visual cues may no longer be explicit. We call this upstream semantic loss. To address it, we propose PoseBridge, an HPE-aware ZSSAR framework that bridges intermediate HPE representations to skeleton-text alignment. Rather than adding an RGB action branch or object detector, PoseBridge extracts pose-anchored semantic cues from the same HPE process that produces skeletons, then transfers them through skeleton-conditioned bridging and semantic prototype adaptation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.