# YoTube: Searching Action Proposal via Recurrent and Static Regression   Networks

**Authors:** Hongyuan Zhu, Romain Vial, Shijian Lu, Yonghong Tian, Xianbin Cao

arXiv: 1706.08218 · 2018-04-04

## TL;DR

YoTube is a novel framework combining recurrent and static neural networks to generate accurate action proposals in untrimmed videos by exploiting appearance, motion, and temporal cues.

## Contribution

The paper introduces a fusion of recurrent and static detectors trained on RGB and optical flow for improved action proposal generation in untrimmed videos.

## Key findings

- Outperforms state-of-the-art on UCF-101 and UCF-Sports datasets.
- Effectively handles untrimmed videos with a novel trimming method.
- Combines temporal dynamics and appearance cues for robust proposals.

## Abstract

In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action. Our method consists of a recurrent YoTube detector and a static YoTube detector, where the recurrent YoTube explores the regression capability of RNN for candidate bounding boxes predictions using learnt temporal dynamics and the static YoTube produces the bounding boxes using rich appearance cues in a single frame. Both networks are trained using rgb and optical flow in order to fully exploit the rich appearance, motion and temporal context, and their outputs are fused to produce accurate and robust proposal boxes. Action proposals are finally constructed by linking these boxes using dynamic programming with a novel trimming method to handle the untrimmed video effectively and efficiently. Extensive experiments on the challenging UCF-101 and UCF-Sports datasets show that our proposed technique obtains superior performance compared with the state-of-the-art.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.08218/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1706.08218/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1706.08218/full.md

---
Source: https://tomesphere.com/paper/1706.08218