LSSD: a Controlled Large JPEG Image Database for Deep-Learning-based Steganalysis "into the Wild"
Hugo Ruiz, Mehdi Yedroudj, Marc Chaumont, Fr\'ed\'eric Comby, G\'erard, Subsol

TL;DR
This paper introduces LSSD, a large-scale JPEG image database with 2 million images, designed to facilitate large-scale deep-learning steganalysis research by providing diverse, controlled, and publicly available data.
Contribution
The paper presents the creation of LSSD, a large, diverse, and controlled JPEG image database, along with a methodology for its development and potential for large-scale steganalysis experiments.
Findings
LSSD contains 2 million images, significantly larger than previous datasets.
The database is publicly available for research use.
A detailed pipeline for database construction and potential for further expansion is provided.
Abstract
For many years, the image databases used in steganalysis have been relatively small, i.e. about ten thousand images. This limits the diversity of images and thus prevents large-scale analysis of steganalysis algorithms. In this paper, we describe a large JPEG database composed of 2 million colour and grey-scale images. This database, named LSSD for Large Scale Steganalysis Database, was obtained thanks to the intensive use of \enquote{controlled} development procedures. LSSD has been made publicly available, and we aspire it could be used by the steganalysis community for large-scale experiments. We introduce the pipeline used for building various image database versions. We detail the general methodology that can be used to redevelop the entire database and increase even more the diversity. We also discuss computational cost and storage cost in order to develop images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
