Boosting Mobile CNN Inference through Semantic Memory
Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu,, Mengwei Xu

TL;DR
This paper introduces SMTM, a semantic memory system inspired by human visual recognition, that accelerates on-device CNN inference by leveraging hierarchical memory, semantic encoding, and adaptive caching, achieving up to 2X speedup.
Contribution
It presents a novel semantic memory architecture for CNN inference that encodes features into low-dimensional vectors and adaptively manages cache, improving speed over existing methods.
Findings
Up to 2X faster inference on mobile devices.
Significant speedup over standard and prior cache methods.
Acceptable accuracy loss with the proposed approach.
Abstract
Human brains are known to be capable of speeding up visual recognition of repeatedly presented objects through faster memory encoding and accessing procedures on activated neurons. For the first time, we borrow and distill such a capability into a semantic memory design, namely SMTM, to improve on-device CNN inference. SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest, and further incorporates several novel techniques to put it into effects: (1) it encodes high-dimensional feature maps into low-dimensional, semantic vectors for low-cost yet accurate cache and lookup; (2) it uses a novel metric in determining the exit timing considering different layers' inherent characteristics; (3) it adaptively adjusts the cache size and semantic vectors to fit the scene dynamics. SMTM is prototyped on commodity CNN engine and runs on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · CCD and CMOS Imaging Sensors
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
