Stacking Small Language Models for Generalizability

Laurence Liang

arXiv:2410.15570·cs.CL·October 22, 2024

Stacking Small Language Models for Generalizability

Laurence Liang

PDF

Open Access

TL;DR

This paper proposes fine-tuning stacks of small language models (FSLM) as a cost-effective, interpretable alternative to large language models, demonstrating promising results on natural language benchmarks.

Contribution

It introduces FSLM, a novel approach that stacks small language models for improved generalizability and interpretability, reducing costs compared to large models.

Findings

01

FSLM achieves competitive performance on benchmarks.

02

FSLM reduces training and inference costs.

03

FSLM enhances interpretability through natural language communication.

Abstract

Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in resource-limited settings. This paper introduces a new approach called fine-tuning stacks of language models (FSLM), which involves stacking small language models (SLM) as an alternative to LLMs. By fine-tuning each SLM to perform a specific task, this approach breaks down high level reasoning into multiple lower-level steps that specific SLMs are responsible for. As a result, FSLM allows for lower training and inference costs, and also improves model interpretability as each SLM communicates with the subsequent one through natural language. By evaluating FSLM on common natural language benchmarks, this paper highlights promising early results toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques