Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

Bitan Majumder; Anirban Sen

arXiv:2602.21933·cs.CL·February 26, 2026

Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

Bitan Majumder, Anirban Sen

PDF

Open Access

TL;DR

This study demonstrates that fine-tuning a smaller transformer-based model like DistilBERT on domain-specific data outperforms large language models in sarcasm detection for code-mixed Hinglish, especially in low-resource scenarios.

Contribution

It shows that domain-adaptive fine-tuning of small models can surpass large language models in sarcasm detection for code-mixed text.

Findings

01

DistilBERT achieved 84% accuracy, outperforming LLMs.

02

Fine-tuning small models is effective in low-resource settings.

03

LLMs underperform in zero and few-shot sarcasm detection in code-mixed Hinglish.

Abstract

Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Hate Speech and Cyberbullying Detection