SBCFormer: Lightweight Network Capable of Full-size ImageNet   Classification at 1 FPS on Single Board Computers

Xiangyong Lu; Masanori Suganuma; Takayuki Okatani

arXiv:2311.03747·cs.CV·December 22, 2023·2 cites

SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers

Xiangyong Lu, Masanori Suganuma, Takayuki Okatani

PDF

Open Access 1 Repo

TL;DR

SBCFormer is a novel lightweight CNN-ViT hybrid network designed for full-size ImageNet classification on low-end single board computers, achieving high accuracy at 1 FPS on Raspberry Pi 4.

Contribution

It introduces an architectural design that balances attention mechanism efficiency and local detail preservation for SBCs, enabling high-accuracy ImageNet classification.

Findings

01

Achieves around 80% top-1 accuracy on ImageNet-1K.

02

Runs at 1 frame/sec on Raspberry Pi 4.

03

Outperforms existing lightweight models in accuracy-speed trade-off.

Abstract

Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although many lightweight networks have been developed for mobile/edge devices, they primarily target smartphones with more powerful processors and not SBCs with the low-end CPUs. This paper introduces a CNN-ViT hybrid network called SBCFormer, which achieves high accuracy and fast computation on such low-end CPUs. The hardware constraints of these CPUs make the Transformer's attention mechanism preferable to convolution. However, using attention on low-end CPUs presents a challenge: high-resolution internal feature maps demand excessive computational resources, but reducing their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xyonglu/sbcformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings