GreatSplicing: A Semantically Rich Splicing Dataset
Jiaming Liang, Yuwan Xue, Haowei Liu, Zhenqi Dai, Yu Liao, Rui Wang, Weihao Jiang, Yaping Liu, Zhikun Chen, Guoxiao Liu, Bo Liu, Xiuli Bi

TL;DR
GreatSplicing is a large-scale, high-quality dataset with diverse semantic categories designed to improve splicing forgery detection models and address overfitting and benchmarking issues in the field.
Contribution
The paper introduces GreatSplicing, a new dataset with 5,000 images across 335 semantic categories, enhancing the training and evaluation of splicing detection models.
Findings
Models trained on GreatSplicing show lower misidentification rates.
Detection models exhibit better cross-dataset generalization.
GreatSplicing provides a more comprehensive benchmark for future research.
Abstract
In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing traces. Meanwhile, the lack of a reasonable benchmark dataset has led to inconsistent experimental settings across existing detection methods. To address these issues, we propose GreatSplicing, a manually created, large-scale, high-quality splicing dataset. GreatSplicing comprises 5,000 spliced images and covers spliced regions across 335 distinct semantic categories, enabling detection models to learn splicing traces more effectively. Empirical results show that detection models trained on GreatSplicing achieve low misidentification rates and stronger cross-dataset generalization compared to existing datasets. GreatSplicing is now publicly available for research purposes at the following link.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection
