SBCFormer: Lightweight Network Capable of Full-size ImageNet Classification at 1 FPS on Single Board Computers
Xiangyong Lu, Masanori Suganuma, Takayuki Okatani

TL;DR
SBCFormer is a novel lightweight CNN-ViT hybrid network designed for full-size ImageNet classification on low-end single board computers, achieving high accuracy at 1 FPS on Raspberry Pi 4.
Contribution
It introduces an architectural design that balances attention mechanism efficiency and local detail preservation for SBCs, enabling high-accuracy ImageNet classification.
Findings
Achieves around 80% top-1 accuracy on ImageNet-1K.
Runs at 1 frame/sec on Raspberry Pi 4.
Outperforms existing lightweight models in accuracy-speed trade-off.
Abstract
Computer vision has become increasingly prevalent in solving real-world problems across diverse domains, including smart agriculture, fishery, and livestock management. These applications may not require processing many image frames per second, leading practitioners to use single board computers (SBCs). Although many lightweight networks have been developed for mobile/edge devices, they primarily target smartphones with more powerful processors and not SBCs with the low-end CPUs. This paper introduces a CNN-ViT hybrid network called SBCFormer, which achieves high accuracy and fast computation on such low-end CPUs. The hardware constraints of these CPUs make the Transformer's attention mechanism preferable to convolution. However, using attention on low-end CPUs presents a challenge: high-resolution internal feature maps demand excessive computational resources, but reducing their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
