Boosting Mobile CNN Inference through Semantic Memory

Yun Li; Chen Zhang; Shihao Han; Li Lyna Zhang; Baoqun Yin; Yunxin Liu,; Mengwei Xu

arXiv:2112.02644·cs.CV·December 7, 2021

Boosting Mobile CNN Inference through Semantic Memory

Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu,, Mengwei Xu

PDF

Open Access

TL;DR

This paper introduces SMTM, a semantic memory system inspired by human visual recognition, that accelerates on-device CNN inference by leveraging hierarchical memory, semantic encoding, and adaptive caching, achieving up to 2X speedup.

Contribution

It presents a novel semantic memory architecture for CNN inference that encodes features into low-dimensional vectors and adaptively manages cache, improving speed over existing methods.

Findings

01

Up to 2X faster inference on mobile devices.

02

Significant speedup over standard and prior cache methods.

03

Acceptable accuracy loss with the proposed approach.

Abstract

Human brains are known to be capable of speeding up visual recognition of repeatedly presented objects through faster memory encoding and accessing procedures on activated neurons. For the first time, we borrow and distill such a capability into a semantic memory design, namely SMTM, to improve on-device CNN inference. SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest, and further incorporates several novel techniques to put it into effects: (1) it encodes high-dimensional feature maps into low-dimensional, semantic vectors for low-cost yet accurate cache and lookup; (2) it uses a novel metric in determining the exit timing considering different layers' inherent characteristics; (3) it adaptively adjusts the cache size and semantic vectors to fit the scene dynamics. SMTM is prototyped on commodity CNN engine and runs on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings