Attention Mechanism in Randomized Time Warping
Yutaro Hiraoka, Kazuya Okamura, Kota Suto, Kazuhiro Fukui

TL;DR
This paper demonstrates that Randomized Time Warping (RTW) functions similarly to self-attention mechanisms in Transformers, offering a novel interpretation and showing RTW's superior performance in motion recognition tasks.
Contribution
It reveals that RTW can be interpreted as a form of self-attention, providing a new perspective on RTW's role in sequential pattern analysis and its advantages over traditional self-attention.
Findings
RTW weights correlate highly with self-attention weights (average correlation 0.80).
RTW achieves a 5% performance boost over Transformers on the Something-Something V2 dataset.
RTW operates on entire sequences, unlike local self-attention, leading to performance benefits.
Abstract
This paper reveals that we can interpret the fundamental function of Randomized Time Warping (RTW) as a type of self-attention mechanism, a core technology of Transformers in motion recognition. The self-attention is a mechanism that enables models to identify and weigh the importance of different parts of an input sequential pattern. On the other hand, RTW is a general extension of Dynamic Time Warping (DTW), a technique commonly used for matching and comparing sequential patterns. In essence, RTW searches for optimal contribution weights for each element of the input sequential patterns to produce discriminative features. Although the two approaches look different, these contribution weights can be interpreted as self-attention weights. In fact, the two weight patterns look similar, producing a high average correlation of 0.80 across the ten smallest canonical angles. However, they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
