TriADA: Massively Parallel Trilinear Matrix-by-Tensor Multiply-Add Algorithm and Device Architecture for the Acceleration of 3D Discrete Transformations

Stanislav Sedukhin (1); Yoichi Tomioka (1); Kazuya Matsumoto (1); Yuichi Okuyama (1) ((1) The University of Aizu; Japan)

arXiv:2506.22818·cs.DC·July 1, 2025

TriADA: Massively Parallel Trilinear Matrix-by-Tensor Multiply-Add Algorithm and Device Architecture for the Acceleration of 3D Discrete Transformations

Stanislav Sedukhin (1), Yoichi Tomioka (1), Kazuya Matsumoto (1), Yuichi Okuyama (1) ((1) The University of Aizu, Japan)

PDF

Open Access

TL;DR

TriADA introduces a massively parallel, energy-efficient architecture and algorithms for accelerating 3D discrete transformations and tensor operations, significantly improving performance and scalability in high-performance computing and AI workloads.

Contribution

The paper presents a novel massively parallel algorithm and device architecture for efficient 3D tensor transformations, addressing computational and energy challenges in HPC and AI.

Findings

01

TriADA achieves hypercubic arithmetic complexity in linear time-steps.

02

The architecture scales efficiently with problem size and reduces energy consumption.

03

It effectively accelerates multilinear tensor operations in demanding workloads.

Abstract

Multilinear transformations are key in high-performance computing (HPC) and artificial intelligence (AI) workloads, where data is represented as tensors. However, their high computational and memory demands, which grow with dimensionality, often slow down critical tasks. Moreover, scaling computation by enlarging the number of parallel processing units substantially increases energy consumption, limiting widespread adoption, especially for sparse data, which is common in HPC and AI applications. This paper introduces the Trilinear Algorithm and isomorphic to algorithm Device Architecture (TriADA) to address these challenges with the following innovations: (1) a massively parallel, low-rank algorithm for computing a family of trilinear (3D) discrete orthogonal transformations (3D-DXTs), which is a special case of the more general 3-mode matrix-by-tensor multiplication (3D-GEMT); (2) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Data Storage Technologies