TL;DR
This paper presents MLB-YouTube, a new dataset for fine-grained activity detection in baseball videos, and evaluates various models on classification, detection, and pitch prediction tasks, highlighting the importance of temporal structure.
Contribution
Introduction of the MLB-YouTube dataset for fine-grained activity recognition and comprehensive evaluation of recognition approaches on multiple tasks.
Findings
Temporal structure improves activity recognition accuracy.
Models can predict pitch speed and type from broadcast videos.
The dataset enables challenging fine-grained activity detection tasks.
Abstract
In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection. The dataset contains two settings: segmented video classification as well as activity detection in continuous videos. We experimentally compare various recognition approaches capturing temporal structure in activity videos, by classifying segmented videos and extending those approaches to continuous videos. We also compare models on the extremely difficult task of predicting pitch speed and pitch type from broadcast baseball videos. We find that learning temporal structure is valuable for fine-grained activity recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
