Exploiting Ensemble Learning for Cross-View Isolated Sign Language   Recognition

Fei Wang; Kun Li; Yiqi Nie; Zhangling Duan; Peng Zou; Zhiliang Wu,; Yuwei Wang; and Yanyan Wei

arXiv:2502.02196·cs.CV·February 5, 2025

Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition

Fei Wang, Kun Li, Yiqi Nie, Zhangling Duan, Peng Zou, Zhiliang Wu,, Yuwei Wang, and Yanyan Wei

PDF

Open Access 1 Repo

TL;DR

This paper presents an ensemble learning approach using a Video Swin Transformer to improve cross-view sign language recognition, addressing viewpoint variability and achieving top competition rankings.

Contribution

It introduces a novel ensemble strategy with a multi-dimensional Video Swin Transformer for cross-view sign language recognition, enhancing robustness and generalization.

Findings

01

Achieved 3rd place in both RGB and RGB-D tracks at WWW 2025 challenge.

02

Demonstrated improved cross-view recognition performance.

03

Validated effectiveness of ensemble learning in sign language recognition.

Abstract

In this paper, we present our solution to the Cross-View Isolated Sign Language Recognition (CV-ISLR) challenge held at WWW 2025. CV-ISLR addresses a critical issue in traditional Isolated Sign Language Recognition (ISLR), where existing datasets predominantly capture sign language videos from a frontal perspective, while real-world camera angles often vary. To accurately recognize sign language from different viewpoints, models must be capable of understanding gestures from multiple angles, making cross-view recognition challenging. To address this, we explore the advantages of ensemble learning, which enhances model robustness and generalization across diverse views. Our approach, built on a multi-dimensional Video Swin Transformer model, leverages this ensemble strategy to achieve competitive performance. Finally, our solution ranked 3rd in both the RGB-based ISLR and RGB-D-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiafei127/cv_islr_www2025
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis

MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Stochastic Depth · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer