TL;DR
This paper introduces a hardware-aware knowledge distillation method to develop a lightweight, high-performance image denoising model optimized for mobile NPUs, achieving near-teacher quality with significantly reduced parameters.
Contribution
It presents a novel NPU-aware training approach and a lightweight denoising network that maintains high fidelity while enabling real-time mobile deployment.
Findings
Achieved 37.66 dB PSNR on validation benchmark.
Recovered 99.8% of teacher quality with 21.2x fewer parameters.
NPU-compatible operations enable up to 3.88x faster inference than GPU.
Abstract
While deep-learning-based image restoration has achieved unprecedented fidelity, deployment on mobile Neural Processing Units (NPUs) remains bottlenecked by operator incompatibility and memory-access overhead. We propose an NPU-aware hardware-algorithm co-design approach for real-world image denoising on mobile NPUs. Our approach employs a high-capacity teacher to supervise a lightweight student network specifically designed to leverage the tiled-memory architectures of modern mobile SoCs. By prioritizing NPU-native primitives -- standard 3x3 convolutions, ReLU activations, and nearest-neighbor upsampling -- and employing a progressive context expansion strategy (up to 1024x1024 crops), the model achieves 37.66 dB PSNR / 0.9278 SSIM on the validation benchmark and 37.58 dB PSNR / 0.9098 SSIM on the held-out test benchmark at full resolution (2432x3200) in the Mobile AI 2026 challenge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
