Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction
Binbin Liu, Qilong Zheng

TL;DR
This paper presents a new compilation algorithm for BWDSP's VLIW DSP that efficiently implements multiplication-accumulation instructions, significantly enhancing performance.
Contribution
It introduces a specialized instruction algorithm tailored for BWDSP's architecture within the Open64 compiler, enabling optimized multiply-accumulate operations.
Findings
Achieves up to 8.85x speedup on BWDSP
Enhances performance for multiply-accumulate algorithms
Addresses compiler support for special instructions
Abstract
BWDSP is a 32bit static scalar digital signal processor with VLIW and SIMD features, which is designed for high-performance computing. Associated special instructions are designed for its special architecture and application scenarios. However, the existing compilation framework doesn't meet these special instructions. Therefore, in the context of traditional Open64 compiler, proposed a special instruction algorithm. Through this algorithm implements the multiplication-accumulation operation with BWDSP structure, to improve the performance of algorithms with multiply-accumulate requirements. Experimental results show that the algorithm, which can make an maximum of 8.85 speedup on BWDSP.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Algorithms and Data Compression
