On-Device Vision Training, Deployment, and Inference on a Thumb-Sized Microcontroller

Jeremy Ellis

arXiv:2604.23012·cs.LG·April 28, 2026

On-Device Vision Training, Deployment, and Inference on a Thumb-Sized Microcontroller

Jeremy Ellis

PDF

1 Repo

TL;DR

This paper demonstrates a complete on-device vision ML pipeline running entirely on a microcontroller, enabling real-time image classification without external infrastructure.

Contribution

It introduces a novel end-to-end microcontroller-based vision system with optimized training, deployment, and inference, all implemented in minimal C++ code.

Findings

01

Achieves 6.3 FPS inference on a $15 microcontroller

02

Completes training in approximately 9 minutes for three-class classification

03

Provides open-source code and datasets for reproducibility

Abstract

This paper presents a complete, end-to-end on-device vision machine learning pipeline, comprising data acquisition, two-layer CNN training with Adam optimization, and real-time inference, executing entirely on a microcontroller-class device costing $15-40 USD. Unlike cloud-based workflows that require external infrastructure and conceal the computational pipeline from the practitioner, this system implements every step of the core ML lifecycle in approximately 1,750 lines of readable C++ that compiles in under one minute using the Arduino IDE, with no external ML dependencies. Running on the Seeed Studio ESP32-S3 XIAO ML Kit (8 MB PSRAM), the firmware achieves three-class 64x64 image classification in approximately 9 minutes per training run, with real-time inference at 6.3 FPS. Key contributions include: correct batch-level gradient accumulation; pre-computed resize lookup tables for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

webmcu-ai/on-device-vision-ai
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.