SLVideo: A Sign Language Video Moment Retrieval Framework

Gon\c{c}alo Vinagre Martins; Jo\~ao Magalh\~aes; Afonso Quinaz; Carla; Viegas; Sofia Cavaco

arXiv:2407.15668·cs.CV·November 7, 2024

SLVideo: A Sign Language Video Moment Retrieval Framework

Gon\c{c}alo Vinagre Martins, Jo\~ao Magalh\~aes, Afonso Quinaz, Carla, Viegas, Sofia Cavaco

PDF

Open Access

TL;DR

SLVideo is a novel sign language video retrieval system that uses facial and hand embeddings, enabling text-based search and sign similarity queries in Portuguese Sign Language videos.

Contribution

It introduces a comprehensive retrieval framework that incorporates facial expressions and a thesaurus for sign similarity, advancing sign language video search technology.

Findings

01

Promising zero-shot retrieval performance

02

Effective use of CLIP embeddings for sign language

03

Supports sign similarity and annotation editing

Abstract

SLVideo is a video moment retrieval system for Sign Language videos that incorporates facial expressions, addressing this gap in existing technology. The system extracts embedding representations for the hand and face signs from video frames to capture the signs in their entirety, enabling users to search for a specific sign language video segment with text queries. A collection of eight hours of annotated Portuguese Sign Language videos is used as the dataset, and a CLIP model is used to generate the embeddings. The initial results are promising in a zero-shot setting. In addition, SLVideo incorporates a thesaurus that enables users to search for similar signs to those retrieved, using the video segment embeddings, and also supports the edition and creation of video sign language annotations. Project web page: https://novasearch.github.io/SLVideo/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training · Focus