Towards Continuous Sign Language Conversation from Isolated Signs
Youngmin Kim, Kyobin Choo, Jiwoo Park, Minseo Kim, Chanyoung Kim, Junhyeok Kim, Seong Jae Hwang

TL;DR
This paper introduces a novel approach to create continuous sign language conversations from isolated signs, leveraging large-scale datasets, a retrieval-guided translation, and a diffusion Transformer model to improve sign language AI systems.
Contribution
It presents SignaVox-W and SignaVox-U datasets, a retrieval-guided translation method, and BRAID diffusion Transformer for duration alignment, enabling sign-to-sign conversational modeling.
Findings
Enhanced motion quality in continuous sign language generation
Improved semantic alignment of sign responses
Scalable signer-centered interaction for sign language AI
Abstract
Sign language is the primary language for many Deaf and Hard-of-Hearing (DHH) signers, yet most conversational AI systems still mediate interaction through spoken or written language. This spoken-language-centered interface can limit access for signers for whom spoken or written language is not the most accessible medium, motivating direct sign-to-sign conversational modeling. However, sentence-level sign video data are expensive to collect and annotate, leaving existing sign translation and production models with limited vocabulary coverage and weak open-domain generalization. We address this bottleneck by constructing continuous sign conversations from isolated signs: large-scale labeled isolated clips are collected as lexically grounded motion primitives and recomposed into sign-language-ordered utterances derived from existing dialogue corpora. We introduce SignaVox-W, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
