# DynaMark: A Reinforcement Learning Framework for Dynamic Watermarking in Industrial Machine Tool Controllers

**Authors:** Navid Aftabi, Abhishek Hanchate, Satish Bukkapatnam, and Dan Li

arXiv: 2508.21797 · 2025-09-01

## TL;DR

DynaMark is a reinforcement learning framework that adaptively applies dynamic watermarking to industrial machine controllers, effectively detecting tampering with minimal energy use and maintaining control performance in real-time.

## Contribution

It introduces a novel RL-based approach that models dynamic watermarking as an MDP, enabling online adaptation without system knowledge, improving detection and efficiency.

## Key findings

- Reduces watermark energy by 70% compared to baselines
- Maintains detection delay within one sampling interval
- Validates effectiveness on digital twin and physical testbed

## Abstract

Industry 4.0's highly networked Machine Tool Controllers (MTCs) are prime targets for replay attacks that use outdated sensor data to manipulate actuators. Dynamic watermarking can reveal such tampering, but current schemes assume linear-Gaussian dynamics and use constant watermark statistics, making them vulnerable to the time-varying, partly proprietary behavior of MTCs. We close this gap with DynaMark, a reinforcement learning framework that models dynamic watermarking as a Markov decision process (MDP). It learns an adaptive policy online that dynamically adapts the covariance of a zero-mean Gaussian watermark using available measurements and detector feedback, without needing system knowledge. DynaMark maximizes a unique reward function balancing control performance, energy consumption, and detection confidence dynamically. We develop a Bayesian belief updating mechanism for real-time detection confidence in linear systems. This approach, independent of specific system assumptions, underpins the MDP for systems with linear dynamics. On a Siemens Sinumerik 828D controller digital twin, DynaMark achieves a reduction in watermark energy by 70% while preserving the nominal trajectory, compared to constant variance baselines. It also maintains an average detection delay equivalent to one sampling interval. A physical stepper-motor testbed validates these findings, rapidly triggering alarms with less control performance decline and exceeding existing benchmarks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21797/full.md

## Figures

34 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21797/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/2508.21797/full.md

---
Source: https://tomesphere.com/paper/2508.21797