Activity Recognition on a Large Scale in Short Videos - Moments in Time   Dataset

Ankit Shah; Harini Kesavamoorthy; Poorva Rane; Pramati Kalwad,; Alexander Hauptmann; Florian Metze

arXiv:1809.00241·cs.CV·September 14, 2018·1 cites

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset

Ankit Shah, Harini Kesavamoorthy, Poorva Rane, Pramati Kalwad,, Alexander Hauptmann, Florian Metze

PDF

Open Access

TL;DR

This paper advances activity recognition in short videos by leveraging multimodal features and novel fusion techniques, achieving high accuracy on the Moments in Time dataset.

Contribution

It introduces a new approach combining visual and textual features with fusion methods for improved activity classification.

Findings

01

Achieved 89.23% Top-5 accuracy on 20 classes

02

Outperformed baseline TRN model significantly

03

Demonstrated effectiveness of visual-textual fusion techniques

Abstract

Moments capture a huge part of our lives. Accurate recognition of these moments is challenging due to the diverse and complex interpretation of the moments. Action recognition refers to the act of classifying the desired action/activity present in a given video. In this work, we perform experiments on Moments in Time dataset to recognize accurately activities occurring in 3 second clips. We use state of the art techniques for visual, auditory and spatio temporal localization and develop method to accurately classify the activity in the Moments in Time dataset. Our novel approach of using Visual Based Textual features and fusion techniques performs well providing an overall 89.23 % Top - 5 accuracy on the 20 classes - a significant improvement over the Baseline TRN model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization