Adaptable Register File Organization for Vector Processors

Crist\'obal Ram\'irez Lazo; Enrico Reggiani; Carlos Rojas Morales,; Roger Figueras Bagu\'e; Luis Alfonso Villa Vargas; Marco Antonio Ram\'irez; Salinas; Mateo Valero Cort\'es; Osman Sabri Unsal; Adri\'an Cristal

arXiv:2111.05301·cs.AR·May 31, 2022

Adaptable Register File Organization for Vector Processors

Crist\'obal Ram\'irez Lazo, Enrico Reggiani, Carlos Rojas Morales,, Roger Figueras Bagu\'e, Luis Alfonso Villa Vargas, Marco Antonio Ram\'irez, Salinas, Mateo Valero Cort\'es, Osman Sabri Unsal, Adri\'an Cristal

PDF

Open Access

TL;DR

This paper introduces AVA, an adaptable vector processor architecture that dynamically reconfigures vector length to optimize performance and resource utilization across applications with varying data parallelism levels.

Contribution

AVA is a novel vector processor design that combines efficiency for short vectors with the ability to reconfigure for longer vectors, improving performance and resource use.

Findings

01

AVA achieves 2X speedup over its default configuration.

02

AVA saves 50% area compared to long vector VP.

03

Reconfigurable MVL enhances performance for diverse applications.

Abstract

Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques