Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas; Edoardo Debenedetti; Florian Tram\`er

arXiv:2406.18382·cs.CR·July 3, 2024

Adversarial Search Engine Optimization for Large Language Models

Fredrik Nestaas, Edoardo Debenedetti, Florian Tram\`er

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Preference Manipulation Attacks that exploit large language models in search engines and chatbots, revealing vulnerabilities where attackers can manipulate content rankings to favor their products, potentially degrading overall LLM output quality.

Contribution

The paper presents a novel class of attacks called Preference Manipulation Attacks targeting LLMs, demonstrating their effectiveness on real-world search engines and APIs, and analyzing their implications.

Findings

01

Attacks successfully manipulated LLM rankings on Bing and Perplexity.

02

Manipulation increased traffic and monetization for attacker-controlled content.

03

The attacks create a prisoner's dilemma, leading to collective degradation of LLM outputs.

Abstract

Large Language Models (LLMs) are increasingly used in applications where the model selects from competing third-party content, such as in LLM-powered search engines or chatbot plugins. In this paper, we introduce Preference Manipulation Attacks, a new class of attacks that manipulate an LLM's selections to favor the attacker. We demonstrate that carefully crafted website content or plugin documentations can trick an LLM to promote the attacker products and discredit competitors, thereby increasing user traffic and monetization. We show this leads to a prisoner's dilemma, where all parties are incentivized to launch attacks, but the collective effect degrades the LLM's outputs for everyone. We demonstrate our attacks on production LLM search engines (Bing and Perplexity) and plugin APIs (for GPT-4 and Claude). As LLMs are increasingly used to rank third-party content, we expect…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- Novel and highly relevant security problem for SEO in light of LLMs - Well written, easy to read - Strong threat model - Proper disclosures are made - Experiments are very well designed. Real world tests make the paper very convincing in terms of efficacy. Also, I like that there was at least one experiment in which the injection was made on highly ranked websites, since explicitly asking the LLM to consider a random website makes the results a bit less realistic.

Weaknesses

- This is an interesting security problem, but I have a hard time seeing why this paper is suitable for ICLR. The fundamental vector here is prompt injection (which is already well documented in the literature, as acknowledged by the authors). So there is little technical contribution that comes through to me in the paper, unless I am missing something. There is also plenty of literature talking about poisoning LLM's via "messing" with the internet, e.g., see the older Wikipedia dumps paper (Car

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper explores a new application scenario of prompt injections in LLM search engines / plugin APIs. 2. The experiments aim at real-world application using production LLM search engines (Bing and Perplexity) and plugin APIs (for GPT-4 and Claude). They build a webpage and feed it to the LLM with a realistic competing product webpage. It is convincing that current SOTA LLM-based recommendation systems are vulnerable to prompt injections. 3. From the dynamics study, the paper reveals an i

Weaknesses

1. I am not 100% sure about the practicality of the attack. If a product X is not well-known, it is very rare for a customer to know X and ask “What camera should I choose between Sony’s Alpha 7 IV and X”. An attackable scenario is that the customer asks “What camera should I choose for hiking” without offering options, and the LLM searches from all available cameras, including X, and ends up recommending X due to preference manipulation. In this setting, however, SEO seems much more important a

Reviewer 03Rating 8Confidence 4

Strengths

Originality: * The paper demonstrates a novel SEO-like attack on LLM-based recommendation systems. Quality: * Good discussion in Sections 6.2 and 6.3 on the fundamental ambiguity of this sort of attack. * Clear plots and narrative Clarity: * Clear presentation and demonstration of attack success Significance: * The attack demonstrates the dangers of increasingly relying on LLMs for systems that have more concrete mathematical solutions. The paper goes on to discuss how it is not clear at all

Weaknesses

* Abstract would likely benefit from mention of SEO or recommender systems to make it more clear to the user what the potential threat model for this attack is. * Table 1 is not entirely clear – how is the recommendation rate calculated? Should the percentages sum to 100%? * Typo in Fig 5 (b) “GPT-4 Trubo” -> “GPT-4 Turbo”

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Web Data Mining and Analysis · Natural Language Processing Techniques

MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer