Joint Architecture-Token-Bitwidth Multi-Axis Optimization of Vision Transformers for Semiconductor IC Packaging

Phat Nguyen; Xue Geng; Kaixin Xu; Wang Zhe; Xulei Yang; Ngai-Man Cheung

arXiv:2605.01742·cs.CV·May 5, 2026

Joint Architecture-Token-Bitwidth Multi-Axis Optimization of Vision Transformers for Semiconductor IC Packaging

Phat Nguyen, Xue Geng, Kaixin Xu, Wang Zhe, Xulei Yang, Ngai-Man Cheung

PDF

TL;DR

This paper introduces a holistic multi-axis optimization framework for Vision Transformers, jointly tuning architecture, token processing, and bit-width to significantly improve efficiency for industrial applications without sacrificing accuracy.

Contribution

It is among the first to jointly optimize architecture, token, and bit-width in ViTs, specifically targeting resource-efficient deployment in semiconductor manufacturing.

Findings

01

Achieves over 10x throughput improvement and 10x reductions in parameters, FLOPs, and energy consumption.

02

Maintains accuracy on industrial defect classification tasks despite aggressive compression.

03

Demonstrates the effectiveness of combined multi-axis optimization in real-world industrial scenarios.

Abstract

Vision Transformers (ViTs) have achieved strong performance in visual recognition, yet their deployment in resource-constrained industrial environments remains limited. Some main challenges are their high computational cost, memory requirement, and energy consumption. While individual efficiency techniques such as neural architecture search (NAS), token compression, and low-precision inference have been extensively studied, most prior work targets only a single optimization axis, limiting overall deployment gains while preserving accuracy. In this paper, we present one of the first holistic frameworks that jointly optimizes three complementary axes: architecture, token, and bit-width. Specifically, the framework identifies compact backbones via Neural Architecture Search (AutoFormer), reduces information processing via token merging (ToMe), and accelerates per-operation execution via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.