Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification

Mikhael Djajapermana; Moritz Reiber; Daniel Mueller-Gritschneder; Ulf Schlichtmann

arXiv:2511.02992·cs.CV·January 1, 2026

Hybrid Convolution and Vision Transformer NAS Search Space for TinyML Image Classification

Mikhael Djajapermana, Moritz Reiber, Daniel Mueller-Gritschneder, Ulf Schlichtmann

PDF

Open Access

TL;DR

This paper proposes a new hybrid CNN-ViT search space for NAS to develop efficient image classification models suitable for tinyML, balancing accuracy and computational constraints.

Contribution

A novel hybrid CNN-ViT search space for NAS that includes local, global, and pooling blocks tailored for tinyML deployment.

Findings

01

Achieved superior accuracy compared to ResNet-based tinyML models.

02

Produced architectures with faster inference speeds under size constraints.

03

Demonstrated effectiveness on CIFAR10 dataset.

Abstract

Hybrids of Convolutional Neural Network (CNN) and Vision Transformer (ViT) have outperformed pure CNN or ViT architecture. However, since these architectures require large parameters and incur large computational costs, they are unsuitable for tinyML deployment. This paper introduces a new hybrid CNN-ViT search space for Neural Architecture Search (NAS) to find efficient hybrid architectures for image classification. The search space covers hybrid CNN and ViT blocks to learn local and global information, as well as the novel Pooling block of searchable pooling layers for efficient feature map reduction. Experimental results on the CIFAR10 dataset show that our proposed search space can produce hybrid CNN-ViT architectures with superior accuracy and inference speed to ResNet-based tinyML models under tight model size constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Big Data and Digital Economy