TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu, Yao Lei Xu, Danilo P. Mandic

TL;DR
This paper introduces TensorSLM, a training-free embedding compression method using Tensor-Train Decomposition, enabling energy-efficient deployment of sub-billion parameter language models on low-end devices like Raspberry Pi.
Contribution
It proposes a novel, training-free tensor decomposition technique for embedding compression that maintains performance while reducing energy consumption on edge devices.
Findings
Achieves 2x compression of embedding layers.
Halves energy consumption per query.
Maintains comparable language task performance.
Abstract
Small Language Models (SLMs, or on-device LMs) have significantly fewer parameters than Large Language Models (LLMs). They are typically deployed on low-end devices, like mobile phones and single-board computers. Unlike LLMs, which rely on increasing model size for better generalisation, SLMs designed for edge applications are expected to have adaptivity to the deployment environments and energy efficiency given the device battery life constraints, which are not addressed in datacenter-deployed LLMs. This paper addresses these two requirements by proposing a training-free token embedding compression approach using Tensor-Train Decomposition (TTD). Each pre-trained token embedding vector is converted into a lower-dimensional Matrix Product State (MPS). We comprehensively evaluate the extracted low-rank structures across compression ratio, language task performance, latency, and energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Topic Modeling · Network Packet Processing and Optimization
MethodsOPT
