Return of the Encoder: Maximizing Parameter Efficiency for SLMs

Mohamed Elfeki; Rui Liu; Chad Voegele

arXiv:2501.16273·cs.CL·January 31, 2025

Return of the Encoder: Maximizing Parameter Efficiency for SLMs

Mohamed Elfeki, Rui Liu, Chad Voegele

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that encoder-decoder architectures are more efficient than decoder-only models for small language models, offering significant latency and throughput advantages on edge devices, and introduces a knowledge distillation method to enhance their performance.

Contribution

The paper provides a systematic analysis of encoder-decoder versus decoder-only models for small language models and introduces a knowledge distillation framework to improve encoder-decoder capabilities.

Findings

01

Encoder-decoder models achieve 47% lower first-token latency.

02

Encoder-decoder models have 4.7x higher throughput on edge devices.

03

Knowledge distillation improves encoder-decoder performance by up to 6 points.

Abstract

The dominance of large decoder-only language models has overshadowed encoder-decoder architectures, despite their fundamental efficiency advantages in sequence processing. For small language models (SLMs) - those with 1 billion parameters or fewer - our systematic analysis across GPU, CPU, and NPU platforms reveals that encoder-decoder architectures achieve 47% lower first-token latency and 4.7x higher throughput compared to decoder-only models on edge devices. These gains may be attributed to encoder-decoder's one-time input processing and efficient separation of understanding and generation phases. We introduce a novel knowledge distillation framework that enables encoder-decoder models to leverage capabilities from large scalable decoder-only teachers while preserving their architectural advantages, achieving up to 6 average performance points improvement across diverse tasks, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/encoder-decoder-slm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Soft Robotics and Applications · Scheduling and Optimization Algorithms

MethodsKnowledge Distillation