Efficient Column-Wise N:M Pruning on RISC-V CPU

Chi-Wei Chu; Ding-Yong Hong; Jan-Jan Wu

arXiv:2507.17301·cs.DC·July 24, 2025

Efficient Column-Wise N:M Pruning on RISC-V CPU

Chi-Wei Chu, Ding-Yong Hong, Jan-Jan Wu

PDF

Open Access

TL;DR

This paper introduces a column-wise N:M pruning method optimized for RISC-V CPUs, significantly boosting CNN inference speed while maintaining high accuracy.

Contribution

It presents a novel pruning strategy combined with architecture-specific optimizations and operation fusion for efficient CNN inference on RISC-V.

Findings

01

ResNet inference throughput increased by up to 4.0x

02

Top-1 accuracy preserved within 2.1% of baseline

03

Effective reduction in memory access and overhead

Abstract

In deep learning frameworks, weight pruning is a widely used technique for improving computational efficiency by reducing the size of large models. This is especially critical for convolutional operators, which often act as performance bottlenecks in convolutional neural networks (CNNs). However, the effectiveness of pruning heavily depends on how it is implemented, as different methods can significantly impact both computational performance and memory footprint. In this work, we propose a column-wise N:M pruning strategy applied at the tile level and modify XNNPACK to enable efficient execution of pruned models on the RISC-V vector architecture. Additionally, we propose fusing the operations of im2col and data packing to minimize redundant memory accesses and memory overhead. To further optimize performance, we incorporate AITemplate's profiling technique to identify the optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression