Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!
Yanna Jiang, Guangsheng Yu, Qingyuan Yu, Yi Chen, Qin Wang

TL;DR
This paper demonstrates that despite the disruptive effects of Neural Structural Obfuscation (NSO) on white-box watermarks, a novel recovery framework called Canon can fully restore model integrity and watermark verification by enforcing structural consistency.
Contribution
The paper introduces Canon, a comprehensive recovery method that counteracts NSO attacks by enforcing global structural consistency and synchronizing downstream model components.
Findings
Canon achieves 100% recovery success against NSO attacks.
Canon restores watermark verification without sacrificing model utility.
The approach is effective even under complex, extended NSO transformations.
Abstract
Neural Structural Obfuscation (NSO) (USENIX Security'23) is a family of ``zero cost'' structure-editing transforms (\texttt{nso\_zero}, \texttt{nso\_clique}, \texttt{nso\_split}) that inject dummy neurons. By combining neuron permutation and parameter scaling, NSO makes a radical modification to the network structure and parameters while strictly preserving functional equivalence, thereby disrupting white-box watermark verification. This capability has been a fundamental challenge to the reliability of existing white-box watermarking schemes. We rethink NSO and, for the first time, fully recover from the damage it has caused. We redefine NSO as a graph-consistent threat model within a \textit{producer--consumer} paradigm. This formulation posits that any obfuscation of a producer node necessitates a compatible layout update in all downstream consumers to maintain structural integrity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Generative Adversarial Networks and Image Synthesis
