A Variable Vector Length SIMD Architecture for HW/SW Co-designed   Processors

Rakesh Kumar; Alejandro Martinez; Antonio Gonzalez

arXiv:2102.13410·cs.AR·March 1, 2021·1 cites

A Variable Vector Length SIMD Architecture for HW/SW Co-designed Processors

Rakesh Kumar, Alejandro Martinez, Antonio Gonzalez

PDF

Open Access

TL;DR

This paper introduces a flexible SIMD architecture with variable vector lengths and selective writing, improving hardware utilization and performance in HW/SW co-designed processors for data parallel applications.

Contribution

It proposes a novel variable length SIMD architecture and associated vectorization techniques to enhance scalability and efficiency of SIMD accelerators.

Findings

01

31% reduction in dynamic instructions for SPECFP2006

02

40% reduction in dynamic instructions for Physicsbench

03

up to 13% speedup over scalar baseline

Abstract

Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to improve the performance. One of the most potent optimizations, vectorization, has been utilized by modern microprocessors, to exploit the data level parallelism through SIMD accelerators. Due to their hardware simplicity, these accelerators have evolved in terms of width from 64-bit vectors in Intel MMX to 512-bit wide vector units in Intel Xeon Phi and AVX-512. Although SIMD accelerators are simple in terms of hardware design, code generation for them has always been a challenge. Moreover, increasing vector lengths with each new generation add to this complexity. This paper explores the scalability of SIMD accelerators from the code generation point of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems