Towards Continuous Sign Language Conversation from Isolated Signs

Youngmin Kim; Kyobin Choo; Jiwoo Park; Minseo Kim; Chanyoung Kim; Junhyeok Kim; Seong Jae Hwang

arXiv:2605.14705·cs.CV·May 15, 2026

Towards Continuous Sign Language Conversation from Isolated Signs

Youngmin Kim, Kyobin Choo, Jiwoo Park, Minseo Kim, Chanyoung Kim, Junhyeok Kim, Seong Jae Hwang

PDF

TL;DR

This paper introduces a novel approach to create continuous sign language conversations from isolated signs, leveraging large-scale datasets, a retrieval-guided translation, and a diffusion Transformer model to improve sign language AI systems.

Contribution

It presents SignaVox-W and SignaVox-U datasets, a retrieval-guided translation method, and BRAID diffusion Transformer for duration alignment, enabling sign-to-sign conversational modeling.

Findings

01

Enhanced motion quality in continuous sign language generation

02

Improved semantic alignment of sign responses

03

Scalable signer-centered interaction for sign language AI

Abstract

Sign language is the primary language for many Deaf and Hard-of-Hearing (DHH) signers, yet most conversational AI systems still mediate interaction through spoken or written language. This spoken-language-centered interface can limit access for signers for whom spoken or written language is not the most accessible medium, motivating direct sign-to-sign conversational modeling. However, sentence-level sign video data are expensive to collect and annotate, leaving existing sign translation and production models with limited vocabulary coverage and weak open-domain generalization. We address this bottleneck by constructing continuous sign conversations from isolated signs: large-scale labeled isolated clips are collected as lexically grounded motion primitives and recomposed into sign-language-ordered utterances derived from existing dialogue corpora. We introduce SignaVox-W, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.