Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC
Weiyu Zhou, Zheng Wang, Chao Chen, Yike Li, Yongkui Yang, Zhuoyu Wu, Anupam Chattopadhyay

TL;DR
The paper introduces the Tensor Manipulation Unit (TMU), a reconfigurable near-memory hardware block that significantly accelerates tensor data movement and manipulation, reducing latency and improving pipeline efficiency in AI systems.
Contribution
It presents the design and integration of a reconfigurable TMU for efficient tensor data movement, supporting diverse operators and enhancing AI SoC performance.
Findings
Achieves up to 1413x latency reduction over ARM A72.
Supports over 10 tensor manipulation operators.
Reduces end-to-end inference latency by 34.6% when integrated with TPU.
Abstract
While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal computation, remains underexplored. This work addresses that gap by introducing the Tensor Manipulation Unit (TMU), a reconfigurable, near-memory hardware block designed to efficiently execute data-movement-intensive operators. TMU manipulates long datastreams in a memory-to-memory fashion using a RISC-inspired execution model and a unified addressing abstraction, enabling broad support for both coarse- and fine-grained tensor transformations. Integrated alongside a TPU within a high-throughput AI SoC, the TMU leverages double buffering and output forwarding to improve pipeline utilization. Fabricated in SMIC 40nm technology, the TMU occupies only 0.019 mm2 while supporting over 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications
