Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Nikolay Blagoev; O\u{g}uzhan Ersoy; Lydia Yiyu Chen

arXiv:2511.09780·cs.LG·April 15, 2026

Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO

Nikolay Blagoev, O\u{g}uzhan Ersoy, Lydia Yiyu Chen

PDF

1 Repo

TL;DR

This paper investigates the vulnerabilities of decentralized Group Relative Policy Optimization (GRPO) in training large language models, demonstrating effective adversarial attacks and proposing defenses, including logit-based filtering and LLM judging.

Contribution

It introduces the first adversarial attacks on decentralized GRPO and proposes two novel defense mechanisms to improve robustness.

Findings

01

Adversaries can achieve up to 100% attack success rate in 50 iterations.

02

Proposed defenses effectively prevent most attacks, except DoS.

03

Code for attacks and defenses is publicly available at https://github.com/gensyn-ai/HTTT.

Abstract

Group Relative Policy Optimization (GRPO) has demonstrated wide adoption in the post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and preferred behaviour is learnt via reinforcement learning. Owing to the small communication volume, GRPO is inherently suitable for decentralised training as the prompts can be concurrently answered by multiple nodes and these completions are exchanged in the form of strings. In this work, we explore the robustness of decentralised GRPO by presenting the first adversarial attacks and countermeasures. We present a diverse set of attacks where malicious nodes poison benign models by sharing their poisoned completions. We demonstrate these attacks on math and coding tasks and show that an adversary can achieve attack success rates of up to 100% in as few as 50 iterations. Moreover, to mitigate the attacks, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gensyn-ai/HTTT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.