Convolutional Long Short-Term Memory Networks for Recognizing First   Person Interactions

Swathikiran Sudhakaran; Oswald Lanz

arXiv:1709.06495·cs.CV·September 20, 2017

Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

Swathikiran Sudhakaran, Oswald Lanz

PDF

TL;DR

This paper introduces a deep learning model combining convolutional neural networks and convolutional LSTMs to recognize first-person interactions, effectively capturing both short-term and long-term spatio-temporal features.

Contribution

It proposes a novel architecture that preserves spatio-temporal structure and outperforms existing RGB-based methods on first-person interaction datasets.

Findings

01

Outperforms state-of-the-art on UTKinect-FirstPerson dataset

02

Surpasses previous RGB-only methods by over 20% accuracy

03

Effective in recognizing complex ego-motion interactions

Abstract

In this paper, we present a novel deep learning based approach for addressing the problem of interaction recognition from a first person perspective. The proposed approach uses a pair of convolutional neural networks, whose parameters are shared, for extracting frame level features from successive frames of the video. The frame level features are then aggregated using a convolutional long short-term memory. The hidden state of the convolutional long short-term memory, after all the input video frames are processed, is used for classification in to the respective categories. The two branches of the convolutional neural network perform feature encoding on a short time interval whereas the convolutional long short term memory encodes the changes on a longer temporal duration. In our network the spatio-temporal structure of the input is preserved till the very final processing stage.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.