Lessons from Defending Gemini Against Indirect Prompt Injections

Chongyang Shi; Sharon Lin; Shuang Song; Jamie Hayes; Ilia Shumailov; Itay Yona; Juliette Pluto; Aneesh Pappu; Christopher A. Choquette-Choo; Milad Nasr; Chawin Sitawarin; Gena Gibson; Andreas Terzis; John "Four" Flynn

arXiv:2505.14534·cs.CR·May 21, 2025

Lessons from Defending Gemini Against Indirect Prompt Injections

Chongyang Shi, Sharon Lin, Shuang Song, Jamie Hayes, Ilia Shumailov, Itay Yona, Juliette Pluto, Aneesh Pappu, Christopher A. Choquette-Choo, Milad Nasr, Chawin Sitawarin, Gena Gibson, Andreas Terzis, John "Four" Flynn

PDF

Open Access 1 Video

TL;DR

This paper discusses the evaluation of Gemini's robustness against sophisticated adversarial prompt injections, highlighting lessons learned and ongoing improvements to enhance model security.

Contribution

It introduces an adversarial evaluation framework for testing Gemini's resilience and shares insights from continuous testing against adaptive attacks.

Findings

01

Gemini shows vulnerabilities to prompt injections

02

Continuous adversarial testing improves model robustness

03

Adaptive attack techniques reveal new security challenges

Abstract

Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this report, we set out Google DeepMind's approach to evaluating the adversarial robustness of Gemini models and describe the main lessons learned from the process. We test how Gemini performs against a sophisticated adversary through an adversarial evaluation framework, which deploys a suite of adaptive attack techniques to run continuously against past, current, and future versions of Gemini. We describe how these ongoing evaluations directly help make Gemini more resilient against manipulation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AI Agents can write 10,000 lines of hacking code in seconds [Dr. Ilia Shumailov]· youtube

Taxonomy

TopicsLogic, programming, and type systems · Autonomous Vehicle Technology and Safety · Formal Methods in Verification

MethodsSparse Evolutionary Training