Sensitivity of Small Language Models to Fine-tuning Data Contamination

Nicy Scaria; Silvester John Joseph Kennedy; Deepak Subramani

arXiv:2511.06763·cs.CL·November 11, 2025

Sensitivity of Small Language Models to Fine-tuning Data Contamination

Nicy Scaria, Silvester John Joseph Kennedy, Deepak Subramani

PDF

Open Access 3 Reviews

TL;DR

This paper systematically investigates how small language models are affected by data contamination during instruction tuning, revealing significant vulnerabilities especially to syntactic transformations and an unexpected susceptibility increase in larger models.

Contribution

It provides empirical evidence of contamination vulnerabilities in small language models, highlighting asymmetric effects of syntactic and semantic transformations, and proposes evaluation protocols for robustness assessment.

Findings

01

Syntactic transformations cause near-complete failure across models.

02

Larger models are more susceptible to semantic contamination.

03

Alignment sometimes reduces robustness, not always improving resilience.

Abstract

Small Language Models (SLMs) are increasingly being deployed in resource-constrained environments, yet their behavioral robustness to data contamination during instruction tuning remains poorly understood. We systematically investigate the contamination sensitivity of 23 SLMs (270M to 4B parameters) across multiple model families by measuring susceptibility to syntactic and semantic transformation types during instruction tuning: syntactic transformations (character and word reversal) and semantic transformations (irrelevant and counterfactual responses), each applied at contamination levels of 25\%, 50\%, 75\%, and 100\%. Our results reveal fundamental asymmetries in vulnerability patterns: syntactic transformations cause catastrophic performance degradation, with character reversal producing near-complete failure across all models regardless of size or family, while semantic…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. This paper is not limited to a single model family but systematically tests 23 Small Language Models (SLMs) from various families, with parameters ranging from 270M to 4B, making its conclusions broadly representative. 2. This paper points to the conclusion of a "Capability Curse", where more capable, larger-parameter models are paradoxically more prone to learning incorrect semantic instructions (For the discussion of the second conclusion, please see the weakness analysis). 3. The prob

Weaknesses

W1： There are some issues with the presentation of the chart results. Such as， Figure 1 lacks labels or captions that clearly explain the exact meanings of the horizontal and vertical axes, making it difficult to understand at the beginning of reading. W2: Although the authors claim "Capability Curse" is counterintuitive, it seems to be a common intuition that more complex models are less robust to training data contamination. [1] [2][3] W3: In the Abstract and the "Alignment paradox" section of

Reviewer 02Rating 4Confidence 4

Strengths

- The paper is well-written and easy to understand. - This paper is, to my knowledge, the first to systematically investigate the impact of fine-tuning data contamination on SLMs at this scale, providing insights for real-world deployment. - The authors experimented with full-finetuning instead of PEFT is worth noting.

Weaknesses

- The *irrelevant* dataset was constructed by pairing a question with a randomly selected answer from a different example in the clean dataset. While this tests for question-answer semantic correspondence, the irrelevant answers are still high-quality, well-formed, and grammatically correct responses, merely answers to the wrong questions. This may not fully represent other common types of data contamination, such as ‘garbage’ text, HTML tags, or unparseable noise, which might have a different (

Reviewer 03Rating 2Confidence 4

Strengths

- The paper examines contamination patterns across a wide range of language models, varying in size and family, to ensure the generalizability of its findings. - While it may seem intuitive that syntactic patterns are particularly harmful, the paper’s empirical demonstration of this is valuable. More broadly, the observed differences in how models react to various contamination patterns provide important insights.

Weaknesses

- It is unclear how realistic these transformations are in real-world settings, particularly the character and word reversal cases and the large-scale contamination levels (25%–100%). While code-switching may occur, especially in multilingual contexts, it represents a far less disruptive transformation. At such high levels of contamination, the primary concern may no longer be the model’s sensitivity or robustness. - The paper does not have a **Related Work** section.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Software Engineering Research