Learning to Localize Temporal Events in Large-scale Video Data

Mikel Bober-Irizar; Miha Skalic; David Austin

arXiv:1910.11631·cs.CV·October 28, 2019

Learning to Localize Temporal Events in Large-scale Video Data

Mikel Bober-Irizar, Miha Skalic, David Austin

PDF

Open Access 1 Repo

TL;DR

This paper explores methods for accurately localizing specific events within large-scale videos, combining decision trees and deep learning to improve temporal detection accuracy for applications like video search.

Contribution

It introduces two novel approaches for temporal event localization in videos, including a gradient boosted decision tree and a hybrid deep learning model, advancing the state-of-the-art in large-scale video analysis.

Findings

01

Achieved 5th place in Youtube-8M challenge

02

Demonstrated effectiveness of combined models for event localization

03

Provided insights into large-scale video event detection

Abstract

We address temporal localization of events in large-scale video data, in the context of the Youtube-8M Segments dataset. This emerging field within video recognition can enable applications to identify the precise time a specified event occurs in a video, which has broad implications for video search. To address this we present two separate approaches: (1) a gradient boosted decision tree model on a crafted dataset and (2) a combination of deep learning models based on frame-level data, video-level data, and a localization model. The combinations of these two approaches achieved 5th place in the 3rd Youtube-8M video recognition challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mxbi/youtube8m-2019
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Music and Audio Processing