Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow   Architecture

Mohammed Elbtity; Peyton Chandarana; Ramtin Zand

arXiv:2407.08700·cs.AR·July 12, 2024·3 cites

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Mohammed Elbtity, Peyton Chandarana, Ramtin Zand

PDF

Open Access

TL;DR

The paper introduces Flex-TPU, a reconfigurable dataflow architecture for TPUs that dynamically adapts to different layers, significantly boosting performance over traditional fixed dataflow TPUs with minimal overhead.

Contribution

Develops the first runtime reconfigurable dataflow TPU, enabling dynamic dataflow changes per layer to optimize performance.

Findings

01

Achieves up to 2.75x performance improvement over conventional TPU.

02

Maintains minimal area and power overheads.

03

Validates effectiveness across multiple ML workloads.

Abstract

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML accelerators, like graphical processing units (GPUs), being designed specifically to perform the multiply-accumulate (MAC) operations required in the matrix-matrix and matrix-vector multiplies extensively present throughout the execution of deep neural networks (DNNs). Such improvements include maximizing data reuse and minimizing data transfer by leveraging the temporal dataflow paradigms provided by the systolic array architecture. While this design provides a significant performance benefit, the current implementations are restricted to a single dataflow consisting of either input, output, or weight stationary architectures. This can limit the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems