Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation
Andreas Voskou, Konstantinos P. Panousis, Dimitrios Kosmopoulos,, Dimitris N. Metaxas, Sotirios Chatzis

TL;DR
This paper introduces a novel end-to-end sign language translation model that eliminates the need for gloss annotations, using stochastic transformer layers with winner-takes-all units, achieving state-of-the-art BLEU-4 scores with reduced memory usage.
Contribution
The paper presents a new Transformer-based SLT model that does not require gloss groundtruth, employing stochastic winner-takes-all layers and variational inference for weights, with efficient compression at inference.
Findings
Achieved top BLEU-4 score on PHOENIX 2014T without gloss supervision.
Reduced memory footprint by over 70%.
Demonstrated effective stochastic layer integration in Transformer networks.
Abstract
Automating sign language translation (SLT) is a challenging real world application. Despite its societal importance, though, research progress in the field remains rather poor. Crucially, existing methods that yield viable performance necessitate the availability of laborious to obtain gloss sequence groundtruth. In this paper, we attenuate this need, by introducing an end-to-end SLT model that does not entail explicit use of glosses; the model only needs text groundtruth. This is in stark contrast to existing end-to-end models that use gloss sequence groundtruth, either in the form of a modality that is recognized at an intermediate model stage, or in the form of a parallel output process, jointly trained with the SLT model. Our approach constitutes a Transformer network with a novel type of layers that combines: (i) local winner-takes-all (LWTA) layers with stochastic winner sampling,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Robot Manipulation and Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Dense Connections · Byte Pair Encoding · Label Smoothing
