Seeing What You're Told: Sentence-Guided Activity Recognition In Video
N. Siddharth, Andrei Barbu, Jeffrey Mark Siskind

TL;DR
This paper introduces a novel video activity recognition system that integrates language and visual cues through sentence-guided attention, enabling multi-modal understanding and diverse applications like description generation and video search.
Contribution
It presents a framework that uses sentence structure to guide activity recognition, combining vision and language for improved multi-activity video analysis.
Findings
Effective sentence-guided focus of attention demonstrated
Generated sentential descriptions of videos successfully
Enabled query-based video search using the framework
Abstract
We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems
