KARIPAP: Quantum-Inspired Tensor Network Compression of Large Language Models Using Infinite Projected Entangled Pair States and Tensor Renormalization Group
Azree Nazri

TL;DR
KARIPAP introduces a quantum-inspired tensor network method using iPEPS and TRG to effectively compress large language models, significantly reducing memory and computation while maintaining accuracy.
Contribution
The paper presents a novel tensor network compression technique for LLMs that captures complex inter-layer correlations using quantum-inspired methods, enabling scalable and efficient AI models.
Findings
Achieves up to 93% memory reduction and 70% parameter reduction on LLaMA-2 7B.
50% faster training and 25% faster inference with minimal accuracy loss.
Reveals redundancy in deeper layers through entanglement profiling.
Abstract
Large Language Models (LLMs) like ChatGPT and LLaMA drive rapid progress in generative AI, yet their huge parameter scales create severe computational and environmental burdens. High training costs, energy use, and limited device deployment hinder accessibility. Existing compression - pruning, distillation, low-rank, and quantization - reduces size but ignores complex inter-layer correlations. We propose KARIPAP, a quantum-inspired tensor network compression using Infinite Projected Entangled Pair States (iPEPS) and Tensor Renormalization Group (TRG) contraction. Unlike 1D Matrix Product States, iPEPS captures multi-directional entanglement in attention and deep transformer layers. TRG ensures polynomial-time contraction, making tensorization feasible while preserving key correlation geometry. Experiments on LLaMA-2 7B show up to 93% memory and 70% parameter reduction, with 50% faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
