Squid: Long Context as a New Modality for Energy-Efficient On-Device   Language Models

Wei Chen; Zhiyuan Li; Shuo Xin; Yihao Wang

arXiv:2408.15518·cs.CL·September 4, 2024

Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang

PDF

Open Access 1 Models

TL;DR

This paper introduces Dolphin, a novel architecture that treats long textual contexts as a separate modality, enabling energy-efficient and low-latency on-device language processing without sacrificing accuracy.

Contribution

Dolphin's innovative approach repurposes image embedding techniques to encode long contexts, significantly reducing energy consumption and latency in on-device language models.

Findings

01

10-fold improvement in energy efficiency

02

5-fold reduction in latency

03

Maintains response quality with extended contexts

Abstract

This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input length for the primary 7B parameter decoder model. Inspired by vision-language models, we repurpose the image embedding projector to encode long textual contexts, effectively treating extended context as a distinct modality. This innovative method enables processing of substantially longer contexts without the typical computational overhead associated with extended input sequences. Empirical evaluations demonstrate a 10-fold improvement in energy efficiency and a 5-fold reduction in latency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
NexaAI/Squid
model· 12 dl· ♡ 35
12 dl♡ 35

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Context-Aware Activity Recognition Systems