TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks
Marco Bertuletti, Yichao Zhang, Diyou Shen, Alessandro Vanelli-Coralli, Frank K. G\"urkaynak, Luca Benini

TL;DR
TensorPool is a specialized many-core processor designed for AI-native radio access networks, achieving high tensor computation throughput and energy efficiency within strict power and latency constraints.
Contribution
We introduce TensorPool, a domain-specific, 3D-stacked many-core processor with tensor acceleration, optimized for AI-based 6G RAN, demonstrating significant performance and efficiency improvements.
Findings
TensorPool achieves 3643 MACs/cycle with 89% tensor-unit utilization.
It provides 6× more tensor performance than a core-only cluster.
TensorPool improves GOPS/W/mm² efficiency by 9.1×.
Abstract
The upcoming integration of AI in the physical layer (PHY) of 6G radio access networks (RAN) will enable a higher quality of service in challenging transmission scenarios. However, deeply optimized AI-Native PHY models impose higher computational complexity compared to conventional baseband, challenging deployment under the sub-msec real-time constraints typical of modern PHYs. Additionally, following the extension to terahertz carriers, the upcoming densification of 6G cell-sites further limits the power consumption of base stations, constraining the budget available for compute ( 100W). The desired flexibility to ensure long term sustainability and the imperative energy-efficiency gains on the high-throughput tensor computations dominating AI-Native PHYs can be achieved by domain-specialization of many-core programmable baseband processors. Following the domain-specialization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
