TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

Mingxue Xu; Yao Lei Xu; Danilo P. Mandic

arXiv:2506.13514·cs.CL·June 17, 2025

TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

Mingxue Xu, Yao Lei Xu, Danilo P. Mandic

PDF

Open Access

TL;DR

This paper introduces TensorSLM, a training-free embedding compression method using Tensor-Train Decomposition, enabling energy-efficient deployment of sub-billion parameter language models on low-end devices like Raspberry Pi.

Contribution

It proposes a novel, training-free tensor decomposition technique for embedding compression that maintains performance while reducing energy consumption on edge devices.

Findings

01

Achieves 2x compression of embedding layers.

02

Halves energy consumption per query.

03

Maintains comparable language task performance.

Abstract

Small Language Models (SLMs, or on-device LMs) have significantly fewer parameters than Large Language Models (LLMs). They are typically deployed on low-end devices, like mobile phones and single-board computers. Unlike LLMs, which rely on increasing model size for better generalisation, SLMs designed for edge applications are expected to have adaptivity to the deployment environments and energy efficiency given the device battery life constraints, which are not addressed in datacenter-deployed LLMs. This paper addresses these two requirements by proposing a training-free token embedding compression approach using Tensor-Train Decomposition (TTD). Each pre-trained token embedding vector is converted into a lower-dimensional Matrix Product State (MPS). We comprehensively evaluate the extracted low-rank structures across compression ratio, language task performance, latency, and energy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Topic Modeling · Network Packet Processing and Optimization

MethodsOPT