Task-Centric Acceleration of Small-Language Models

Dor Tsur; Sharon Adar; Ran Levy

arXiv:2602.24174·cs.CL·March 2, 2026

Task-Centric Acceleration of Small-Language Models

Dor Tsur, Sharon Adar, Ran Levy

PDF

Open Access

TL;DR

This paper introduces TASC, a framework for accelerating small language models through vocabulary enrichment and speculative decoding, improving efficiency in low-variability tasks without sacrificing performance.

Contribution

The paper presents TASC, a novel task-adaptive compression framework with two methods: TASC-ft for fine-tuning with expanded vocabularies and TASC-spec for inference acceleration without extra training.

Findings

01

TASC-ft improves fine-tuning efficiency by vocabulary enrichment.

02

TASC-spec accelerates inference without additional training.

03

Both methods maintain task performance across multiple tasks.

Abstract

Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in high-volume, low-latency settings, where efficiency is crucial. We propose TASC, Task-Adaptive Sequence Compression, a framework for SLM acceleration comprising two use-cases: When performing SLM fine-tuning, we propose TASC-ft, which iteratively enriches the tokenizer vocabulary with high-frequency output n-grams and then fine-tunes the model to utilize the expanded vocabulary. Next, we propose an inference-time method, termed TASC-spec. TASC-spec is a lightweight, training-free speculative decoding method that constructs an n-gram draft model from the task's output corpus, mixing task and context n-gram information.TASC-spec avoids any additional training, while bypassing draft-target vocabulary alignment constraints. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis