FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks
Buse G. A. Tekgul, N. Asokan

TL;DR
FLARE introduces a novel fingerprinting method for verifying ownership of Deep Reinforcement Learning policies using universal adversarial masks, achieving high accuracy and robustness against attacks.
Contribution
The paper presents the first method to use universal adversarial masks as fingerprints for DRL policy verification, addressing transferability and robustness issues.
Findings
FLARE achieves 100% accuracy in identifying stolen policies.
It does not falsely accuse independent policies.
The method is robust against model modification attacks.
Abstract
We propose FLARE, the first fingerprinting mechanism to verify whether a suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of another (victim) policy. We first show that it is possible to find non-transferable, universal adversarial masks, i.e., perturbations, to generate adversarial examples that can successfully transfer from a victim policy to its modified versions but not to independently trained policies. FLARE employs these masks as fingerprints to verify the true ownership of stolen DRL policies by measuring an action agreement value over states perturbed by such masks. Our empirical evaluations show that FLARE is effective (100% action agreement on stolen copies) and does not falsely accuse independent policies (no false positives). FLARE is also robust to model modification attacks and cannot be easily evaded by more informed adversaries without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
