Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

Jairo Gudi\~no-Rosero; Cl\'ement Contet; Umberto Grandi; C\'esar A. Hidalgo

arXiv:2508.04281·cs.CY·March 3, 2026

Prompt Injection Vulnerability of Consensus Generating Applications in Digital Democracy

Jairo Gudi\~no-Rosero, Cl\'ement Contet, Umberto Grandi, C\'esar A. Hidalgo

PDF

TL;DR

This paper investigates the vulnerability of consensus-generating Large Language Models in digital democracy to prompt-injection attacks and proposes a robustness pipeline to mitigate these risks.

Contribution

It identifies specific vulnerabilities of off-the-shelf LLMs in consensus tasks and introduces a defense framework combining detection, structured opinions, and reinforcement learning.

Findings

01

Default LLMs are highly vulnerable to prompt-injection attacks.

02

The proposed robustness pipeline significantly reduces consensus shifts caused by attacks.

03

Vulnerabilities are especially pronounced when opinions are closely balanced.

Abstract

Large Language Models (LLMs) are gaining traction as a method to generate consensus statements and aggregate preferences in digital democracy experiments. Yet, LLMs could introduce critical vulnerabilities in these systems. Here, we examine the vulnerability and robustness of off-the-shelf consensus-generating LLMs to prompt-injection attacks, in which texts are injected to amplify particular viewpoints, erase certain opinions, or divert consensus toward unrelated or irrelevant topics. We construct attack-free and adversarial variants of prompts containing public policy questions and opinion texts, classify opinion and consensus valences with a fine-tuned BERT model, and estimate LLM-human majority agreement rates. Across topics, default LLaMA 3.1 8B Instruct, GPT-4.1 Nano, and Apertus 8B exhibit widespread vulnerability, specially when disagreement and disagreement are finely balanced,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.