TL;DR
This paper introduces a novel method using 3D convolutional neural networks with iterative refinement to accurately segment signs in continuous sign language videos, improving boundary detection across multiple datasets and generalizing well to new signers and languages.
Contribution
The work presents a new approach combining 3D CNNs and iterative refinement for sign language segmentation, outperforming previous methods and demonstrating strong generalization capabilities.
Findings
Significant accuracy improvements over previous methods
Effective generalization to new signers and languages
Robust segmentation across diverse datasets
Abstract
The objective of this work is to determine the location of temporal boundaries between signs in continuous sign language videos. Our approach employs 3D convolutional neural network representations with iterative temporal segment refinement to resolve ambiguities between sign boundary cues. We demonstrate the effectiveness of our approach on the BSLCORPUS, PHOENIX14 and BSL-1K datasets, showing considerable improvement over the prior state of the art and the ability to generalise to new signers, languages and domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
