# Exploring Feature Representation and Training strategies in Temporal   Action Localization

**Authors:** Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis, Patras

arXiv: 1905.10608 · 2019-05-30

## TL;DR

This paper investigates how different feature extraction, representation, and training strategies affect temporal action localization performance, and proposes a two-stage detector that surpasses current state-of-the-art results.

## Contribution

It provides a comprehensive ablation study on various components and introduces a new two-stage detector with improved accuracy.

## Key findings

- Feature extraction methods significantly impact performance.
- Fixed-size feature representation influences localization accuracy.
- The proposed two-stage detector achieves 44.2% mAP@tIoU=0.5 on THUMOS14.

## Abstract

Temporal action localization has recently attracted significant interest in the Computer Vision community. However, despite the great progress, it is hard to identify which aspects of the proposed methods contribute most to the increase in localization performance. To address this issue, we conduct ablative experiments on feature extraction methods, fixed-size feature representation methods and training strategies, and report how each influences the overall performance. Based on our findings, we propose a two-stage detector that outperforms the state of the art in THUMOS14, achieving a mAP@tIoU=0.5 equal to 44.2%.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.10608/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1905.10608/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1905.10608/full.md

---
Source: https://tomesphere.com/paper/1905.10608