SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language   Retrieval

Longtao Jiang; Min Wang; Zecheng Li; Yao Fang; Wengang Zhou; Houqiang; Li

arXiv:2407.16394·cs.CV·July 24, 2024

SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval

Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang, Li

PDF

1 Repo

TL;DR

This paper introduces SEDS, a dual-stream encoder that combines pose and RGB data with attention mechanisms for improved sign language retrieval, achieving superior performance over existing methods.

Contribution

The paper proposes a novel framework integrating pose and RGB modalities with a cross-gloss attention fusion and a fine-grained matching objective for sign language retrieval.

Findings

01

Significantly outperforms state-of-the-art methods on multiple datasets.

02

Efficient end-to-end training with lightweight networks.

03

Effective fusion of local and global sign language features.

Abstract

Different from traditional video retrieval, sign language retrieval is more biased towards understanding the semantic information of human actions contained in video clips. Previous works typically only encode RGB videos to obtain high-level semantic features, resulting in local action details drowned in a large amount of visual information redundancy. Furthermore, existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training, and adopt offline RGB encoder instead, leading to suboptimal feature representation. To address these issues, we propose a novel sign language representation framework called Semantically Enhanced Dual-Stream Encoder (SEDS), which integrates Pose and RGB modalities to represent the local and global information of sign language videos. Specifically, the Pose encoder embeds the coordinates of keypoints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

longtaojiang/seds
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training