Improving Continuous Sign Language Recognition with Adapted Image Models

Lianyu Hu; Tongkai Shi; Liqing Gao; Zekang Liu; Wei Feng

arXiv:2404.08226·cs.CV·April 15, 2024·3 cites

Improving Continuous Sign Language Recognition with Adapted Image Models

Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

PDF

Open Access 1 Repo

TL;DR

This paper introduces AdaptSign, a lightweight adaptation strategy for large vision-language models like CLIP, enabling efficient and effective continuous sign language recognition while preserving pretraining knowledge.

Contribution

AdaptSign employs fixed CLIP features with learnable modules for spatial and temporal modeling, achieving high efficiency and superior performance in CSLR tasks.

Findings

01

AdaptSign outperforms existing CSLR methods on multiple benchmarks.

02

The additional modules only add 3.2% extra computations.

03

Visualizations show effective focus on informative regions and trajectories.

Abstract

The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-tuning the model easily forgets the generic essential knowledge acquired in the pretraining stage and overfits the downstream data. To enable high efficiency when adapting these large vision-language models (e.g., CLIP) to performing continuous sign language recognition (CSLR) while preserving their generalizability, we propose a novel strategy (AdaptSign). Especially, CLIP is adopted as the visual backbone to extract frame-wise features whose parameters are fixed, and a set of learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hulianyuyy/adaptsign
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition

MethodsSparse Evolutionary Training · Circular Smooth Label · Contrastive Language-Image Pre-training