Learning Deep and Compact Models for Gesture Recognition

Koustav Mullick; Anoop M. Namboodiri

arXiv:1712.10136·cs.CV·January 1, 2018

Learning Deep and Compact Models for Gesture Recognition

Koustav Mullick, Anoop M. Namboodiri

PDF

1 Repo

TL;DR

This paper introduces a compact, deep learning model for gesture recognition that balances high accuracy with small size, suitable for mobile devices, by combining 3DCNN-LSTM architecture and knowledge distillation.

Contribution

The paper proposes a novel end-to-end trainable 3DCNN-LSTM model and a knowledge distillation approach to significantly reduce model size while maintaining accuracy.

Findings

01

Achieves near state-of-the-art accuracy with half the model size.

02

Creates a model less than 1MB suitable for real-time mobile use.

03

Reduces model size by over 99% with only 7% accuracy drop.

Abstract

We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than $1 M B$ in size, which is less than one hundredth of our initial model, with a drop of $7%$ in accuracy, and is suitable for real-time gesture recognition on mobile devices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chriswegmann/drone_steering
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation