MCUCoder: Adaptive Bitrate Learned Video Compression for IoT Devices
Ali Hojjat, Janek Haberer, Olaf Landsiedel

TL;DR
MCUCoder is a lightweight, adaptive bitrate video compression model designed for resource-constrained IoT devices, significantly reducing bitrate while maintaining quality and supporting real-time streaming under unstable network conditions.
Contribution
Introduces MCUCoder, an ultra-lightweight adaptive video compression model with minimal parameters and memory footprint, optimized for low-resource IoT edge devices.
Findings
Reduces bitrate by over 55% compared to M-JPEG on benchmark datasets.
Uses only 10.5K parameters and 350KB memory, suitable for MCUs.
Supports adaptive streaming with importance-sorted latent representations.
Abstract
The rapid growth of camera-based IoT devices demands the need for efficient video compression, particularly for edge applications where devices face hardware constraints, often with only 1 or 2 MB of RAM and unstable internet connections. Traditional and deep video compression methods are designed for high-end hardware, exceeding the capabilities of these constrained devices. Consequently, video compression in these scenarios is often limited to M-JPEG due to its high hardware efficiency and low complexity. This paper introduces , an open-source adaptive bitrate video compression model tailored for resource-limited IoT settings. MCUCoder features an ultra-lightweight encoder with only 10.5K parameters and a minimal 350KB memory footprint, making it well-suited for edge devices and MCUs. While MCUCoder uses a similar amount of energy as M-JPEG, it reduces bitrate by 55.65% on the MCL-JCV…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Strength: 1)Lightweight Design for IoT Devices. The encoder of MCUCoder is ultra-lightweight, with only 10.5k parameters and a memory footprint of 350kB, making it highly suitble for resource-constrained IoT devices. 2)Variable Bitrate. MCUCoder supports variable bitrate by generating a latent representation sorted by importance, allowing it to adapt efficiently to bandwidth-constrained environments. 3)Energy Efficiency. MCUCoder employs INT8 quantization, enabling it to apply on DSP accelerator
Weakness: 1)Lack of Explicit Bitrate Allocation. The concept of pruning output channels to achieve variable bitrate has been studied extensively over the years [1, 2]. As the authors noted, different channels represent features at varying frequencies. However, feature frequency distribution can vary across an image, and simply discarding certain frequencies may significantly degrade specific regions, impacting both human and machine perception. In contrast, advanced video codecs typically employ
1. MCUCoder is highly optimized for low-resource IoT environments, with an encoder that requires only 350KB of RAM and achieves JPEG-level energy efficiency, making it feasible for MCU devices. 2. The model supports adaptive bitrate by sorting latent representations based on importance, enabling smooth transmission even under fluctuating network conditions. 3. The INT8 quantized encoder leverages DSP and CMSIS-NN accelerators, reducing power consumption.
1. The paper’s innovations are limited, as many of the techniques used are adaptations of existing methods. The novelty of the proposed model is relatively low, which could limit its contribution. The contribution part is bad presentation and organization. 2. The motivation and rationale for using stochastic dropout in training are not well-explained. Given that it is meant to achieve similar effects to DCT, it’s unclear why a more established and potentially faster method like DCT was not empl
(1) This paper explores learned video compression research on IoT Devices. For a long time, LIC could not be deployed in practical applications due to the huge consumption of resources, and the study solved the problems to some extent. I think the entry point is novel. (2) The experiment proves that MCUCoder has obvious performance improvement compared with M-JPEG on multiple datasets.
(1) From the architecture in Figure 3, I observe that the reason for MCUCoder's lightweight is mainly the use of quantization methods and a simple neural network layer. I wonder what other means the author used to achieve the goal of lightweight? Because some previous works [1,2] have explored the use of quantization in learning-based compression methods, I believe that mere quantization and simple network structure design may limit the degree of innovation in this paper. (2) In the experiments
1. The paper addresses a crucial challenge in neural compression: achieving low encoding complexity for deploying AI codecs on edge devices. It introduces an ultra-lightweight and energy-efficient INT8 quantized encoder tailored for low-resource IoT devices, which appears to be a practical solution. 2. MCUCoder leverages channel importance to generate a progressive bitstream, enabling adaptive bitrate streaming that can adjust to fluctuating network conditions. 3. The authors provide detailed
1. Weak compression performance: Compared to H.264, there is a significant performance gap. The authors should clarify which specific scenarios necessitate the use of extreme resource-constrained environments. 2. Missing compression baselines: In the image compression experiments, only JPEG is used as a baseline. It would be beneficial to include additional traditional compression methods such as JPEG2000 and WebP. Additionally, please clarify which traditional image compression algorithms are
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Video Coding and Compression Technologies · CCD and CMOS Imaging Sensors
