SignDATA: Data Pipeline for Sign Language Translation

Kuanwei Chen; Tingyi Lin

arXiv:2604.20357·cs.CV·April 23, 2026

SignDATA: Data Pipeline for Sign Language Translation

Kuanwei Chen, Tingyi Lin

PDF

1 Repo

TL;DR

SignDATA is a configurable, standardized preprocessing toolkit for sign language datasets that facilitates consistent data preparation, supports multiple backends, and enhances reproducibility in sign language research.

Contribution

The paper introduces SignDATA, a flexible, reproducible preprocessing pipeline for sign language data that unifies heterogeneous datasets and supports multiple extraction backends.

Findings

01

Validated through backend comparison and preprocessing ablations.

02

Demonstrated privacy-aware video generation capabilities.

03

Provided a reproducible, configurable preprocessing layer for sign-language research.

Abstract

Sign-language datasets are difficult to preprocess consistently because they vary in annotation schema, clip timing, signer framing, and privacy constraints. Existing work usually reports downstream models, while the preprocessing pipeline that converts raw video into training-ready pose or video artifacts remains fragmented, backend-specific, and weakly documented. We present SignDATA, a config-driven preprocessing toolkit that standardizes heterogeneous sign-language corpora into comparable outputs for learning. The system supports two end-to-end recipes: a pose recipe that performs acquisition, manifesting, person localization, clipping, cropping, landmark extraction, normalization, and WebDataset export, and a video recipe that replaces pose extraction with signer-cropped video packaging. SignDATA exposes interchangeable MediaPipe and MMPose backends behind a common interface, typed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

balaboom123/signdata-slt
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.