The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference

Fang Li

arXiv:2511.22889·cs.AR·December 1, 2025

The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference

Fang Li

PDF

Open Access

TL;DR

The paper introduces the Immutable Tensor Architecture, a novel hardware paradigm that encodes model weights directly into ASIC interconnects, eliminating memory bottlenecks for energy-efficient AI inference on edge devices.

Contribution

It proposes a new hardware architecture that treats model weights as physical circuit topology, removing the need for traditional memory hierarchies in AI inference.

Findings

01

Eliminates memory hierarchy to reduce energy consumption.

02

Enables secure, energy-efficient LLM deployment on edge devices.

03

Uses a 'Split-Brain' design with host CPU and ASIC for flexible inference.

Abstract

The deployment of Large Language Models (LLMs) on consumer edge devices is throttled by the "Memory Wall" -- the prohibitive bandwidth and energy cost of fetching gigabytes of model weights from DRAM for every token generated. Current architectures (GPUs, NPUs) treat model weights as mutable software data, incurring massive energy penalties to maintain general-purpose programmability. We propose The Immutable Tensor Architecture (ITA), a paradigm shift that treats model weights not as data, but as physical circuit topology. By encoding parameters directly into the metal interconnects and logic of mature-node ASICs (28nm/40nm), ITA eliminates the memory hierarchy entirely. We present a "Split-Brain" system design where a host CPU manages dynamic KV-cache operations while the ITA ASIC acts as a stateless, ROM-embedded dataflow engine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Security and Verification in Computing