Task-Centric Acceleration of Small-Language Models
Dor Tsur, Sharon Adar, Ran Levy

TL;DR
This paper introduces TASC, a framework for accelerating small language models through vocabulary enrichment and speculative decoding, improving efficiency in low-variability tasks without sacrificing performance.
Contribution
The paper presents TASC, a novel task-adaptive compression framework with two methods: TASC-ft for fine-tuning with expanded vocabularies and TASC-spec for inference acceleration without extra training.
Findings
TASC-ft improves fine-tuning efficiency by vocabulary enrichment.
TASC-spec accelerates inference without additional training.
Both methods maintain task performance across multiple tasks.
Abstract
Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in high-volume, low-latency settings, where efficiency is crucial. We propose TASC, Task-Adaptive Sequence Compression, a framework for SLM acceleration comprising two use-cases: When performing SLM fine-tuning, we propose TASC-ft, which iteratively enriches the tokenizer vocabulary with high-frequency output n-grams and then fine-tunes the model to utilize the expanded vocabulary. Next, we propose an inference-time method, termed TASC-spec. TASC-spec is a lightweight, training-free speculative decoding method that constructs an n-gram draft model from the task's output corpus, mixing task and context n-gram information.TASC-spec avoids any additional training, while bypassing draft-target vocabulary alignment constraints. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
