# The Pros and Cons: Rank-aware Temporal Attention for Skill Determination   in Long Videos

**Authors:** Hazel Doughty, Walterio Mayol-Cuevas, Dima Damen

arXiv: 1812.05538 · 2019-04-11

## TL;DR

This paper introduces a novel rank-aware temporal attention model that effectively identifies skill-relevant segments in long videos to determine relative skill levels, outperforming previous methods on multiple datasets.

## Contribution

The paper proposes a new learnable temporal attention approach with a rank-aware loss for skill assessment in long videos, trained with only video-level supervision.

## Key findings

- Outperforms previous approaches by over 4% accuracy
- Achieves up to 12% improvement on individual tasks
- Demonstrates ability to attend to rank-aware video segments

## Abstract

We present a new model to determine relative skill from long videos, through learnable temporal attention modules. Skill determination is formulated as a ranking problem, making it suitable for common and generic tasks. However, for long videos, parts of the video are irrelevant for assessing skill, and there may be variability in the skill exhibited throughout a video. We therefore propose a method which assesses the relative overall level of skill in a long video by attending to its skill-relevant parts. Our approach trains temporal attention modules, learned with only video-level supervision, using a novel rank-aware loss function. In addition to attending to task relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skill. We evaluate our approach on the EPIC-Skills dataset and additionally annotate a larger dataset from YouTube videos for skill determination with five previously unexplored tasks. Our method outperforms previous approaches and classic softmax attention on both datasets by over 4% pairwise accuracy, and as much as 12% on individual tasks. We also demonstrate our model's ability to attend to rank-aware parts of the video.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.05538/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1812.05538/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1812.05538/full.md

---
Source: https://tomesphere.com/paper/1812.05538