MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes

Luca Bompani; Manuele Rusci; Luca Benini; Daniele Palossi; Francesco Conti

arXiv:2605.15423·cs.CV·May 18, 2026

MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes

Luca Bompani, Manuele Rusci, Luca Benini, Daniele Palossi, Francesco Conti

PDF

1 Repo

TL;DR

MR2-ByteTrack is a novel video object detection method designed for ultra-low-power embedded vision sensors, combining multi-resolution inference with detection linking and correction to maintain accuracy while significantly reducing computational load.

Contribution

Introduces MR2-ByteTrack, a versatile VOD approach that enables real-time, energy-efficient object detection on MCU-based embedded vision nodes using CNN and Transformer architectures.

Findings

01

Achieves up to 49.0 mAP with CNN models and 48.7 with Transformer on ImageNetVID.

02

Reduces multiply-accumulate operations by up to 53% for CNNs and 32% for Transformers.

03

Enables real-time Transformer-based VOD on ultra-low-power MCU with 55% energy savings.

Abstract

Modern smart vision sensors need on-device intelligence to process video streams, as cloud computing is often impractical due to bandwidth, latency, and privacy constraints. However, these sensory systems typically rely on ultra-low-power microcontrollers (MCUs) with limited memory and compute, making conventional video object detection methods, which require feature storage or multi-frame buffering, unfeasible. To address this challenge, we introduce Multi-Resolution Rescored ByteTrack (MR2-ByteTrack), a Video Object Detection (VOD) method tailored for MCU-based embedded vision nodes. MR2-ByteTrack reduces computational cost by alternating between full- and low-resolution inference, while linking detections across frames via ByteTrack and correcting misclassifications through the Rescore algorithm, which applies probability union rules to aggregate detection confidence scores across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Bomps4/Multi_Resolution_Rescored_ByteTrack/tree/IEEE_Access
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.