Hierarchical I3D for Sign Spotting
Ryan Wong, Necati Cihan Camg\"oz, Richard Bowden

TL;DR
This paper introduces a hierarchical I3D model for sign spotting in continuous sign language videos, achieving state-of-the-art results by learning multi-level spatio-temporal features for precise sign localization.
Contribution
The paper proposes a novel hierarchical sign spotting approach with a hierarchical network head attached to I3D, improving sign localization in continuous videos.
Findings
Achieved a 0.607 F1 score on the ChaLearn 2022 Sign Spotting Challenge.
Top-1 winning solution in the MSSL track of the challenge.
Demonstrated the effectiveness of hierarchical features for sign localization.
Abstract
Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously identify and localise signs in continuous co-articulated sign videos. To address the limitations of current ISLR-based models, we propose a hierarchical sign spotting approach which learns coarse-to-fine spatio-temporal sign features to take advantage of representations at various temporal levels and provide more precise sign localisation. Specifically, we develop Hierarchical Sign I3D model (HS-I3D) which consists of a hierarchical network head that is attached to the existing spatio-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Gait Recognition and Analysis · Hearing Impairment and Communication
