Diagnosing Generalization Failures in Fine-Tuned LLMs: A Cross-Architectural Study on Phishing Detection
Frank Bobe III, Gregory D. Vetaw, Chase Pavlick, Darshan Bryner, Matthew Cook, and Jose Salas-Vernis

TL;DR
This study investigates why fine-tuned large language models fail to generalize well in phishing detection, revealing architecture and data influences, and proposing diagnostic methods to understand these failures.
Contribution
Introduces a multi-layered diagnostic framework and cross-architectural analysis to identify causes of generalization failures in fine-tuned LLMs.
Findings
Gemma 2 9B achieves >91% F1 with diverse data
Llama 3.1 8B struggles with diverse data integration
Mistral model shows consistent generalization across paradigms
Abstract
The practice of fine-tuning Large Language Models (LLMs) has achieved state-of-the-art performance on specialized tasks, yet diagnosing why these models become brittle and fail to generalize remains a critical open problem. To address this, we introduce and apply a multi-layered diagnostic framework to a cross-architectural study. We fine-tune Llama 3.1 8B, Gemma 2 9B, and Mistral models on a high-stakes phishing detection task and use SHAP analysis and mechanistic interpretability to uncover the root causes of their generalization failures. Our investigation reveals three critical findings: (1) Generalization is driven by a powerful synergy between architecture and data diversity. The Gemma 2 9B model achieves state-of-the-art performance (>91\% F1), but only when trained on a stylistically diverse ``generalist'' dataset. (2) Generalization is highly architecture-dependent. We diagnose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Topic Modeling · Misinformation and Its Impacts
