PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications

Pietro Bonazzi; Nicola Farronato; Stefan Zihlmann; Haotong Qin; Michele Magno

arXiv:2506.18807·cs.CV·November 12, 2025

PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision Applications

Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno

PDF

TL;DR

PicoSAM2 is a lightweight, promptable segmentation model optimized for edge devices, enabling real-time, privacy-preserving in-sensor segmentation with high accuracy and low latency.

Contribution

It introduces PicoSAM2, a novel low-parameter, low-compute segmentation model optimized for in-sensor deployment, building on a depthwise U-Net with knowledge distillation.

Findings

01

Achieves 51.9% mIoU on COCO and 44.9% on LVIS.

02

Runs at 14.3 ms on Sony IMX500 with 86 MACs/cycle.

03

Quantized model size is 1.22MB, suitable for in-sensor deployment.

Abstract

Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable segmentation model optimized for edge and in-sensor execution, including the Sony IMX500. It builds on a depthwise separable U-Net, with knowledge distillation and fixed-point prompt encoding to learn from the Segment Anything Model 2 (SAM2). On COCO and LVIS, it achieves 51.9% and 44.9% mIoU, respectively. The quantized model (1.22MB) runs at 14.3 ms on the IMX500-achieving 86 MACs/cycle, making it the only model meeting both memory and compute constraints for in-sensor deployment. Distillation boosts LVIS performance by +3.5% mIoU and +5.1% mAP. These results demonstrate that efficient, promptable segmentation is feasible directly on-camera, enabling privacy-preserving vision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation