Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng, Yuan Liao, Anupreetham Anupreetham, Ahmed Hasssan, Shixing, Yu, Han-sok Suh, Xiaofeng Hu, Jae-sun Seo

TL;DR
Torch2Chip is an open-source toolkit enabling customizable neural network compression and deployment, facilitating prototype hardware accelerator design with integrated model fusion and parameter extraction for ASIC or FPGA chips.
Contribution
It introduces a fully customizable, end-to-end toolkit supporting user-defined compression algorithms and automatic model fusion for hardware accelerator prototyping.
Findings
Supports both CNN and ViT models.
Enables direct packing of custom compression into deployment format.
Facilitates prototype chip verification with high performance.
Abstract
The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community. First, although the state-of-the-art quantization algorithm can achieve low precision with negligible degradation of accuracy, the latest deep learning framework (e.g., PyTorch) can only support non-customizable 8-bit precision, data format, and parameter extraction. Secondly, the objective of quantization is to enable the computation with low-precision data. However, the current SoTA algorithm treats the quantized integer as an intermediate result, while the final output of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · Pruning
