TL;DR
This paper introduces a set-covering based methodology to generate diverse training databases for steganalysis, improving robustness against cover source mismatch by selecting representative processing pipelines.
Contribution
It proposes a novel set-covering greedy algorithm to create relevant databases that enhance steganalysis performance under operational conditions.
Findings
Set covering outperforms random pipeline selection.
Denoising, sharpening, and downsampling increase diversity.
Generated databases show good generalization on various benchmarks.
Abstract
Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
