Multi-shot Temporal Event Localization: a Benchmark

Xiaolong Liu (1); Yao Hu (2); Song Bai (2,3); Fei Ding (2); Xiang Bai; (1); Philip H.S. Torr (3) ((1) Huazhong University of Science; Technology,; (2) Alibaba Group; (3) University of Oxford)

arXiv:2012.09434·cs.CV·April 16, 2021·5 cites

Multi-shot Temporal Event Localization: a Benchmark

Xiaolong Liu (1), Yao Hu (2), Song Bai (2,3), Fei Ding (2), Xiang Bai, (1), Philip H.S. Torr (3) ((1) Huazhong University of Science, Technology,, (2) Alibaba Group, (3) University of Oxford)

PDF

Open Access 1 Repo

TL;DR

This paper introduces the new task of multi-shot temporal event localization, presents a large-scale dataset called MUSES with extensive shot variations, and evaluates current methods showing significant room for improvement.

Contribution

It defines a novel multi-shot localization task, provides the MUSES dataset with detailed shot variation, and offers a simple baseline approach for intra-instance variation handling.

Findings

01

State-of-the-art methods achieve only 13.1% mAP on MUSES.

02

The proposed baseline improves performance to 18.9% mAP on MUSES.

03

The dataset contains 31,477 event instances over 716 hours of video.

Abstract

Current developments in temporal event or action localization usually target actions captured by a single camera. However, extensive events or actions in the wild may be captured as a sequence of shots by multiple cameras at different positions. In this paper, we propose a new and challenging task called multi-shot temporal event localization, and accordingly, collect a large scale dataset called MUlti-Shot EventS (MUSES). MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intrainstance variations. Our comprehensive evaluations show that the state-of-the-art method in temporal action localization only achieves an mAP of 13.1% at IoU=0.5. As a minor contribution, we present a simple baseline approach for handling the intra-instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xlliu7/muses
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization