Can Small Language Models Learn, Unlearn, and Retain Noise Patterns?
Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

TL;DR
This paper investigates how small language models with 1-3 billion parameters learn, unlearn, and retain various noise patterns, revealing differences based on model size, training data quality, and adaptation strategies.
Contribution
It provides the first comprehensive analysis of noise handling in small language models, highlighting factors influencing their robustness and offering practical training strategies.
Findings
Smaller models like Olmo adapt quickly to noise patterns.
High-quality pretraining data in Phi2 enhances noise resistance.
Training on clean data mitigates noise effects effectively.
Abstract
With the growing need for efficient language models in resource-constrained environments, Small Language Models (SLMs) have emerged as compact and practical alternatives to Large Language Models (LLMs). While studies have explored noise handling in LLMs, little is known about how SLMs handle noise, a critical factor for their reliable real-world deployment. This study investigates the ability of SLMs with parameters between 1 and 3 billion to learn, retain, and subsequently eliminate different types of noise (word flip, character flip, transliteration, irrelevant content, and contradictory information). Four pretrained SLMs (Olmo 1B, Qwen1.5 1.8B, Gemma1.1 2B, and Phi2 2.7B) were instruction-tuned on noise-free data and tested with in-context examples to assess noise learning. Subsequently, noise patterns were introduced in instruction tuning to assess their adaptability. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
