Object and Text-guided Semantics for CNN-based Activity Recognition
Sungmin Eum, Christopher Reale, Heesung Kwon, Claire Bonial, Clare, Voss

TL;DR
This paper introduces a novel CNN for activity recognition that integrates object recognition and text-guided semantics in a multitask learning framework, enhancing recognition accuracy by leveraging large text corpora.
Contribution
It proposes a new end-to-end multitask CNN that incorporates text-guided semantic information to select relevant objects for improved activity recognition.
Findings
Enhanced activity recognition performance over baseline models
Effective use of text-guided semantics for object relevance selection
First investigation of text-guided semantic integration in CNNs for activity recognition
Abstract
Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Anomaly Detection Techniques and Applications
