Three-Stream 3D/1D CNN for Fine-Grained Action Classification and   Segmentation in Table Tennis

Pierre-Etienne Martin (MPI-EVA); Jenny Benois-Pineau (UB); Renaud; P\'eteri (MIA); Julien Morlier (UB)

arXiv:2109.14306·cs.CV·September 30, 2021

Three-Stream 3D/1D CNN for Fine-Grained Action Classification and Segmentation in Table Tennis

Pierre-Etienne Martin (MPI-EVA), Jenny Benois-Pineau (UB), Renaud, P\'eteri (MIA), Julien Morlier (UB)

PDF

1 Repo

TL;DR

This paper introduces a three-stream CNN that fuses RGB, optical flow, and pose data for fine-grained action classification and segmentation in table tennis videos, improving accuracy and convergence speed.

Contribution

It presents a novel multi-modal fusion approach with attention mechanisms for enhanced stroke detection and classification in sports videos.

Findings

01

Faster convergence compared to previous methods

02

Improved classification accuracy for table tennis strokes

03

Effective joint segmentation and classification performance

Abstract

This paper proposes a fusion method of modalities extracted from video through a three-stream network with spatio-temporal and temporal convolutions for fine-grained action classification in sport. It is applied to TTStroke-21 dataset which consists of untrimmed videos of table tennis games. The goal is to detect and classify table tennis strokes in the videos, the first step of a bigger scheme aiming at giving feedback to the players for improving their performance. The three modalities are raw RGB data, the computed optical flow and the estimated pose of the player. The network consists of three branches with attention blocks. Features are fused at the latest stage of the network using bilinear layers. Compared to previous approaches, the use of three modalities allows faster convergence and better performances on both tasks: classification of strokes with known temporal boundaries…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rwightman/posenet-python
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.