Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu (1), Yao Hu (2), Song Bai (2,3), Fei Ding (2), Xiang Bai, (1), Philip H.S. Torr (3) ((1) Huazhong University of Science, Technology,, (2) Alibaba Group, (3) University of Oxford)

TL;DR
This paper introduces the new task of multi-shot temporal event localization, presents a large-scale dataset called MUSES with extensive shot variations, and evaluates current methods showing significant room for improvement.
Contribution
It defines a novel multi-shot localization task, provides the MUSES dataset with detailed shot variation, and offers a simple baseline approach for intra-instance variation handling.
Findings
State-of-the-art methods achieve only 13.1% mAP on MUSES.
The proposed baseline improves performance to 18.9% mAP on MUSES.
The dataset contains 31,477 event instances over 716 hours of video.
Abstract
Current developments in temporal event or action localization usually target actions captured by a single camera. However, extensive events or actions in the wild may be captured as a sequence of shots by multiple cameras at different positions. In this paper, we propose a new and challenging task called multi-shot temporal event localization, and accordingly, collect a large scale dataset called MUlti-Shot EventS (MUSES). MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intrainstance variations. Our comprehensive evaluations show that the state-of-the-art method in temporal action localization only achieves an mAP of 13.1% at IoU=0.5. As a minor contribution, we present a simple baseline approach for handling the intra-instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
