Learning to Localize Temporal Events in Large-scale Video Data
Mikel Bober-Irizar, Miha Skalic, David Austin

TL;DR
This paper explores methods for accurately localizing specific events within large-scale videos, combining decision trees and deep learning to improve temporal detection accuracy for applications like video search.
Contribution
It introduces two novel approaches for temporal event localization in videos, including a gradient boosted decision tree and a hybrid deep learning model, advancing the state-of-the-art in large-scale video analysis.
Findings
Achieved 5th place in Youtube-8M challenge
Demonstrated effectiveness of combined models for event localization
Provided insights into large-scale video event detection
Abstract
We address temporal localization of events in large-scale video data, in the context of the Youtube-8M Segments dataset. This emerging field within video recognition can enable applications to identify the precise time a specified event occurs in a video, which has broad implications for video search. To address this we present two separate approaches: (1) a gradient boosted decision tree model on a crafted dataset and (2) a combination of deep learning models based on frame-level data, video-level data, and a localization model. The combinations of these two approaches achieved 5th place in the 3rd Youtube-8M video recognition challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Music and Audio Processing
