Just Add $\pi$! Pose Induced Video Transformers for Understanding   Activities of Daily Living

Dominick Reilly; Srijan Das

arXiv:2311.18840·cs.CV·December 1, 2023·2 cites

Just Add $\pi$! Pose Induced Video Transformers for Understanding Activities of Daily Living

Dominick Reilly, Srijan Das

PDF

Open Access 1 Repo

TL;DR

This paper introduces PI-ViT, a novel video transformer that incorporates 2D and 3D human pose information via auxiliary modules to improve activity recognition in Activities of Daily Living, achieving state-of-the-art results.

Contribution

The paper presents the first pose-augmented video transformer for ADL, using auxiliary pose modules during training that are discarded during inference, enhancing recognition accuracy.

Findings

01

Achieves state-of-the-art performance on three ADL datasets.

02

Operates without pose data or extra computational cost during inference.

03

Effectively distinguishes similar actions across multiple viewpoints.

Abstract

Video transformers have become the de facto standard for human action recognition, yet their exclusive reliance on the RGB modality still limits their adoption in certain domains. One such domain is Activities of Daily Living (ADL), where RGB alone is not sufficient to distinguish between visually similar actions, or actions observed from multiple viewpoints. To facilitate the adoption of video transformers for ADL, we hypothesize that the augmentation of RGB with human pose information, known for its sensitivity to fine-grained motion and multiple viewpoints, is essential. Consequently, we introduce the first Pose Induced Video Transformer: PI-ViT (or $π$ -ViT), a novel approach that augments the RGB representations learned by video transformers with 2D and 3D pose information. The key elements of $π$ -ViT are two plug-in modules, 2D Skeleton Induction Module and 3D Skeleton…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dominickrei/pi-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Stroke Rehabilitation and Recovery · Hand Gesture Recognition Systems