Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators
Animan Naskar

TL;DR
Tessera introduces a hardware architecture enabling secure, near-line-rate weight streaming for UMA edge accelerators by performing inline, cache-line granularity decryption, thus enhancing security without significant bandwidth overhead.
Contribution
The paper presents Tessera, a novel architecture that achieves secure, high-bandwidth weight streaming for UMA edge accelerators through parallel cryptographic processing at cache-line granularity.
Findings
Tessera achieves 98.4% of theoretical memory bandwidth with minimal overhead.
It maintains optimal bandwidth for vision and language models, unlike page-level encryption.
Tessera effectively neutralizes major UMA-specific attack vectors.
Abstract
Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversaries. In Unified Memory Architecture (UMA) systems, the host CPU and Neural Processing Unit (NPU) share physical DRAM, leaving plaintext model weights directly readable by a compromised OS kernel. Existing defenses fail in this constrained setting: trusted execution environments monopolize scarce memory with permanently reserved regions, while full-memory encryption operates at page granularity. This forces the system to fetch massive 4 KB memory pages for sub-page tensor tiles, severely crippling bandwidth. We present Tessera, a reference architecture for inline, cache-line granularity weight decryption on UMA edge accelerators. The design intercepts 64-byte AXI bursts, computing AES-256-CTR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
