MicroViTv2: Beyond the FLOPS for Edge Energy-Friendly Vision Transformers

Novendra Setyawan; Chi-Chia Sun; Mao-Hsiu Hsu; Wen-Kai Kuo; Jun-Wei Hsieh

arXiv:2605.10148·cs.CV·May 12, 2026

MicroViTv2: Beyond the FLOPS for Edge Energy-Friendly Vision Transformers

Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh

PDF

1 Repo

TL;DR

MicroViTv2 is a lightweight, energy-efficient Vision Transformer optimized for edge devices, achieving higher accuracy and efficiency through reparameterization and novel attention mechanisms.

Contribution

The paper introduces MicroViTv2, a reparameterized Vision Transformer with new modules for faster inference and improved accuracy on edge hardware.

Findings

01

MicroViTv2 surpasses MobileViTv2, EdgeNeXt, and EfficientViT in accuracy.

02

It maintains fast inference and high energy efficiency on Jetson AGX Orin.

03

Structural re-parameterization enhances performance beyond FLOPs considerations.

Abstract

The Vision Transformer (ViT) achieves remarkable accuracy across visual tasks but remains computationally expensive for edge deployment. This paper presents MicroViTv2, a lightweight Vision Transformer optimized for real-device efficiency. Built upon the original MicroViT, the proposed model is designed based on reparameterized design, specifically Reparameterized Patch Embedding (RepEmbed) and Reparameterized Depth-Wise convolution mixer (RepDW) for faster inference, and introduces the Single Depth-Wise Transposed Attention (SDTA) to capture long-range dependencies with minimal redundancy. Despite slightly higher FLOPs, MicroViTv2 improves accuracy up to 0.5% compared to its predecessor and surpassing MobileViTv2, EdgeNeXt, and EfficientViT while maintaining fast inference and high energy efficiency on Jetson AGX Orin. Experiments on ImageNet-1K and COCO demonstrate that hardware-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

novendrastywn/MicroViT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.