Torch2Chip: An End-to-end Customizable Deep Neural Network Compression   and Deployment Toolkit for Prototype Hardware Accelerator Design

Jian Meng; Yuan Liao; Anupreetham Anupreetham; Ahmed Hasssan; Shixing; Yu; Han-sok Suh; Xiaofeng Hu; Jae-sun Seo

arXiv:2405.01775·cs.AR·May 7, 2024

Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design

Jian Meng, Yuan Liao, Anupreetham Anupreetham, Ahmed Hasssan, Shixing, Yu, Han-sok Suh, Xiaofeng Hu, Jae-sun Seo

PDF

Open Access 1 Repo

TL;DR

Torch2Chip is an open-source toolkit enabling customizable neural network compression and deployment, facilitating prototype hardware accelerator design with integrated model fusion and parameter extraction for ASIC or FPGA chips.

Contribution

It introduces a fully customizable, end-to-end toolkit supporting user-defined compression algorithms and automatic model fusion for hardware accelerator prototyping.

Findings

01

Supports both CNN and ViT models.

02

Enables direct packing of custom compression into deployment format.

03

Facilitates prototype chip verification with high performance.

Abstract

The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community. First, although the state-of-the-art quantization algorithm can achieve low precision with negligible degradation of accuracy, the latest deep learning framework (e.g., PyTorch) can only support non-customizable 8-bit precision, data format, and parameter extraction. Secondly, the objective of quantization is to enable the computation with low-precision data. However, the current SoTA algorithm treats the quantized integer as an intermediate result, while the final output of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seolabcornell/torch2chip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · Pruning