LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video   Recognition

Zuxuan Wu; Caiming Xiong; Yu-Gang Jiang; Larry S. Davis

arXiv:1912.01601·cs.CV·December 4, 2019·32 cites

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

PDF

Open Access

TL;DR

LiteEval is a resource-efficient video recognition framework that adaptively balances coarse and fine features using LSTMs and gating, significantly reducing computation while maintaining high accuracy.

Contribution

It introduces a novel coarse-to-fine framework with dynamic computation control for efficient video recognition, applicable in online and offline settings.

Findings

01

Requires less computation than existing methods

02

Achieves high classification accuracy on FCVID and ActivityNet

03

Effective in both online and offline scenarios

Abstract

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios. Exploiting decent yet computationally efficient features derived at a coarse scale with a lightweight CNN model, LiteEval dynamically decides on-the-fly whether to compute more powerful features for incoming video frames at a finer scale to obtain more details. This is achieved by a coarse LSTM and a fine LSTM operating cooperatively, as well as a conditional gating module to learn when to allocate more computation. Extensive experiments are conducted on two large-scale video benchmarks, FCVID and ActivityNet, and the results demonstrate LiteEval requires substantially less computation while offering excellent classification accuracy for both online and offline predictions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory