Realize special instructions on clustering VLIW DSP:   multiplication-accumulation instruction

Binbin Liu; Qilong Zheng

arXiv:1902.05982·cs.OH·February 19, 2019

Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction

Binbin Liu, Qilong Zheng

PDF

Open Access

TL;DR

This paper presents a new compilation algorithm for BWDSP's VLIW DSP that efficiently implements multiplication-accumulation instructions, significantly enhancing performance.

Contribution

It introduces a specialized instruction algorithm tailored for BWDSP's architecture within the Open64 compiler, enabling optimized multiply-accumulate operations.

Findings

01

Achieves up to 8.85x speedup on BWDSP

02

Enhances performance for multiply-accumulate algorithms

03

Addresses compiler support for special instructions

Abstract

BWDSP is a 32bit static scalar digital signal processor with VLIW and SIMD features, which is designed for high-performance computing. Associated special instructions are designed for its special architecture and application scenarios. However, the existing compilation framework doesn't meet these special instructions. Therefore, in the context of traditional Open64 compiler, proposed a special instruction algorithm. Through this algorithm implements the multiplication-accumulation operation with BWDSP structure, to improve the performance of algorithms with multiply-accumulate requirements. Experimental results show that the algorithm, which can make an maximum of 8.85 speedup on BWDSP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Algorithms and Data Compression