Continuous Action Recognition Based on Sequence Alignment
Kaustubh Kulkarni, Georgios Evangelidis, Jan Cech, Radu Horaud

TL;DR
This paper introduces a novel dynamic frame warping technique for continuous action recognition, enabling simultaneous classification and segmentation of video sequences, and extends speech recognition methods to visual actions.
Contribution
It proposes the dynamic frame warping (DFW) method and its one-pass and two-pass extensions for continuous visual action recognition, a novel adaptation from speech recognition techniques.
Findings
DFW outperforms existing methods on RAVEL, Hollywood-1, and Hollywood-2 datasets.
The extensions enable effective recognition with segmentation in continuous videos.
The approach demonstrates competitive accuracy compared to recent published methods.
Abstract
Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
