# Continuous Video to Simple Signals for Swimming Stroke Detection with   Convolutional Neural Networks

**Authors:** Brandon Victor, Zhen He, Stuart Morgan, Dino Miniutti

arXiv: 1705.09894 · 2017-05-30

## TL;DR

This paper introduces a novel CNN-based method to automatically detect swimming strokes from continuous video by mapping frame windows to smooth signals, achieving human-level accuracy and demonstrating general applicability to other sports like tennis.

## Contribution

It pioneers the use of CNNs for continuous event detection in sports videos by mapping frame sequences to smooth signals, a novel approach in computer vision and sports analysis.

## Key findings

- Accurately detects swimming strokes in continuous video.
- Works effectively on tennis stroke detection without model modifications.
- Produces smooth signals comparable to human annotations.

## Abstract

In many sports, it is useful to analyse video of an athlete in competition for training purposes. In swimming, stroke rate is a common metric used by coaches; requiring a laborious labelling of each individual stroke. We show that using a Convolutional Neural Network (CNN) we can automatically detect discrete events in continuous video (in this case, swimming strokes). We create a CNN that learns a mapping from a window of frames to a point on a smooth 1D target signal, with peaks denoting the location of a stroke, evaluated as a sliding window. To our knowledge this process of training and utilizing a CNN has not been investigated before; either in sports or fundamental computer vision research. Most research has been focused on action recognition and using it to classify many clips in continuous video for action localisation.   In this paper we demonstrate our process works well on the task of detecting swimming strokes in the wild. However, without modifying the model architecture or training method, the process is also shown to work equally well on detecting tennis strokes, implying that this is a general process.   The outputs of our system are surprisingly smooth signals that predict an arbitrary event at least as accurately as humans (manually evaluated from a sample of negative results). A number of different architectures are evaluated, pertaining to slightly different problem formulations and signal targets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.09894/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1705.09894/full.md

---
Source: https://tomesphere.com/paper/1705.09894