PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation
Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno

TL;DR
PicoSAM3 is a lightweight, promptable segmentation model designed for real-time, in-sensor deployment on edge devices like the Sony IMX500, achieving high accuracy with minimal latency and resource usage.
Contribution
The paper introduces PicoSAM3, a novel, compact segmentation model optimized for in-sensor execution, combining efficient architecture and knowledge distillation for improved performance.
Findings
Achieves 65.45% mIoU on COCO and 64.01% on LVIS.
Real-time inference at 11.82 ms latency on IMX500.
INT8 quantization maintains accuracy with negligible degradation.
Abstract
Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications such as smart glasses and Internet-of-Things devices. We introduce PicoSAM3, a lightweight promptable visual segmentation model optimized for edge and in-sensor execution, including deployment on the Sony IMX500 vision sensor. PicoSAM3 has 1.3 M parameters and combines a dense CNN architecture with region of interest prompt encoding, Efficient Channel Attention, and knowledge distillation from SAM2 and SAM3. On COCO and LVIS, PicoSAM3 achieves 65.45% and 64.01% mIoU, respectively, outperforming existing SAM-based and edge-oriented baselines at similar or lower complexity. The INT8 quantized model preserves accuracy with negligible degradation while enabling real-time in-sensor inference at 11.82 ms latency on the IMX500, fully complying with its memory and operator constraints. Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Age of Information Optimization
