MIPIAD: Multilingual Indirect Prompt Injection Attack Defense with Qwen -- TF-IDF Hybrid and Meta-Ensemble Learning
Al Muhit Muhtadi, Mostafa Rifat Tazwar

TL;DR
MIPIAD is a defense framework against indirect prompt injection attacks in multilingual LLM systems, combining neural classifiers, lexical features, and ensemble methods, validated on a large synthetic benchmark in English and Bangla.
Contribution
It introduces a novel hybrid ensemble approach using neural, lexical, and validation techniques for multilingual prompt injection defense, with extensibility to over 200 languages.
Findings
Lexical signals alone achieve F1=0.77 in detection.
Hybrid ensemble achieves F1=0.9205 and AUROC=0.9378.
Ensemble methods reduce cross-lingual performance gaps.
Abstract
Indirect prompt injection remains a persistent weakness in retrieval-augmented and tool-using LLM systems, and the problem becomes harder to characterise in multilingual settings. We present MIPIAD, a defense framework evaluated on English and Bangla that combines a sequence classifier fine-tuned from Qwen2.5-1.5B via LoRA (XLPID), TF-IDF lexical features, and validation-tuned ensembling through late fusion, stacking, and gradient boosting. The framework is evaluated on a synthetic benchmark built from BIPIA(Yi et al., 2023) templates spanning five task families -- email, table, QA, abstract, and code-comprising over 1.43 million generated samples, with train and test splits using mutually exclusive attack categories. Across the experiments, lexical signals prove strong (TF-IDF+SVM F1=0.77), and the hybrid XLPID+TF-IDF ensemble achieves the best overall F1 (0.9205) while the Boosting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
