Optimizing Tensor Train Decomposition in DNNs for RISC-V Architectures Using Design Space Exploration and Compiler Optimizations

Theologos Anthimopoulos; Milad Kokhazadeh; Vasilios Kelefouras; Benjamin Himpel; Georgios Keramidas

arXiv:2602.01996·cs.LG·February 3, 2026

Optimizing Tensor Train Decomposition in DNNs for RISC-V Architectures Using Design Space Exploration and Compiler Optimizations

Theologos Anthimopoulos, Milad Kokhazadeh, Vasilios Kelefouras, Benjamin Himpel, Georgios Keramidas

PDF

Open Access

TL;DR

This paper presents a comprehensive methodology combining design space exploration and compiler optimizations to efficiently deploy Tensor Train Decomposed DNN layers on RISC-V architectures, significantly improving inference speed.

Contribution

It introduces an end-to-end exploration approach and a specialized tool for optimizing low-rank decompositions of fully connected layers on RISC-V, reducing inference time.

Findings

01

Tensor Train Decomposition layers run 3x faster than IREE

02

Decomposed layers are 8x faster than Pluto on the same model

03

Proposed method effectively optimizes DNN deployment on RISC-V edge devices

Abstract

Deep neural networks (DNNs) have become indispensable in many real-life applications like natural language processing, and autonomous systems. However, deploying DNNs on resource-constrained devices, e.g., in RISC-V platforms, remains challenging due to the high computational and memory demands of fully connected (FC) layers, which dominate resource consumption. Low-rank factorization (LRF) offers an effective approach to compressing FC layers, but the vast design space of LRF solutions involves complex trade-offs among FLOPs, memory size, inference time, and accuracy, making the LRF process complex and time-consuming. This paper introduces an end-to-end LRF design space exploration methodology and a specialized design tool for optimizing FC layers on RISC-V processors. Using Tensor Train Decomposition (TTD) offered by TensorFlow T3F library, the proposed work prunes the LRF design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications