TL;DR
MR2-ByteTrack is a novel video object detection method designed for ultra-low-power embedded vision sensors, combining multi-resolution inference with detection linking and correction to maintain accuracy while significantly reducing computational load.
Contribution
Introduces MR2-ByteTrack, a versatile VOD approach that enables real-time, energy-efficient object detection on MCU-based embedded vision nodes using CNN and Transformer architectures.
Findings
Achieves up to 49.0 mAP with CNN models and 48.7 with Transformer on ImageNetVID.
Reduces multiply-accumulate operations by up to 53% for CNNs and 32% for Transformers.
Enables real-time Transformer-based VOD on ultra-low-power MCU with 55% energy savings.
Abstract
Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints. However, these sensory systems typically rely on ultra-low-power microcontrollers (MCUs) with limited memory and compute, making conventional video object detection methods, which require feature storage or multi-frame buffering, unfeasible. To address this challenge, we introduce Multi-Resolution Rescored ByteTrack (MR2-ByteTrack), a Video Object Detection (VOD) method tailored for MCU-based embedded vision nodes. MR2-ByteTrack reduces computational cost by alternating between full- and low-resolution inference, while linking detections across frames via ByteTrack and correcting misclassifications through the Rescore algorithm, which applies probability union rules to aggregate detection confidence scores across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
