Genshin: General Shield for Natural Language Processing with Large Language Models
Xiao Peng, Tao Liu, Ying Wang

TL;DR
Genshin is a novel framework that leverages large language models as a one-time plug-in to recover original text and enhance interpretability and robustness in NLP tasks like sentiment analysis and spam detection.
Contribution
The paper introduces Genshin, a cascading framework that uses LLMs for text recovery, improving interpretability and robustness against adversarial attacks in NLP applications.
Findings
Genshin effectively recovers original text with high accuracy.
It exposes vulnerabilities of median models to adversarial attacks.
Demonstrates improved robustness and interpretability in NLP tasks.
Abstract
Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLLaMA
