An Investigation on Group Query Hallucination Attacks
Kehao Miao, Xiaolong Jin

TL;DR
This paper introduces Group Query Attack, a method that tests large language models with multiple simultaneous questions, revealing vulnerabilities like performance degradation and backdoor activation across various tasks.
Contribution
It is the first to systematically analyze how grouped queries impact LLMs, highlighting new failure modes and security risks in multi-query scenarios.
Findings
Group Query Attack significantly degrades model performance.
It can trigger potential backdoors in LLMs.
Effective in reasoning tasks like math and code generation.
Abstract
With the widespread use of large language models (LLMs), understanding their potential failure modes during user interactions is essential. In practice, users often pose multiple questions in a single conversation with LLMs. Therefore, in this study, we propose Group Query Attack, a technique that simulates this scenario by presenting groups of queries to LLMs simultaneously. We investigate how the accumulated context from consecutive prompts influences the outputs of LLMs. Specifically, we observe that Group Query Attack significantly degrades the performance of models fine-tuned on specific tasks. Moreover, we demonstrate that Group Query Attack induces a risk of triggering potential backdoors of LLMs. Besides, Group Query Attack is also effective in tasks involving reasoning, such as mathematical reasoning and code generation for pre-trained and aligned models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
