GreatSplicing: A Semantically Rich Splicing Dataset

Jiaming Liang; Yuwan Xue; Haowei Liu; Zhenqi Dai; Yu Liao; Rui Wang; Weihao Jiang; Yaping Liu; Zhikun Chen; Guoxiao Liu; Bo Liu; Xiuli Bi

arXiv:2310.10070·cs.CV·November 17, 2025·1 cites

GreatSplicing: A Semantically Rich Splicing Dataset

Jiaming Liang, Yuwan Xue, Haowei Liu, Zhenqi Dai, Yu Liao, Rui Wang, Weihao Jiang, Yaping Liu, Zhikun Chen, Guoxiao Liu, Bo Liu, Xiuli Bi

PDF

Open Access

TL;DR

GreatSplicing is a large-scale, high-quality dataset with diverse semantic categories designed to improve splicing forgery detection models and address overfitting and benchmarking issues in the field.

Contribution

The paper introduces GreatSplicing, a new dataset with 5,000 images across 335 semantic categories, enhancing the training and evaluation of splicing detection models.

Findings

01

Models trained on GreatSplicing show lower misidentification rates.

02

Detection models exhibit better cross-dataset generalization.

03

GreatSplicing provides a more comprehensive benchmark for future research.

Abstract

In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing traces. Meanwhile, the lack of a reasonable benchmark dataset has led to inconsistent experimental settings across existing detection methods. To address these issues, we propose GreatSplicing, a manually created, large-scale, high-quality splicing dataset. GreatSplicing comprises 5,000 spliced images and covers spliced regions across 335 distinct semantic categories, enabling detection models to learn splicing traces more effectively. Empirical results show that detection models trained on GreatSplicing achieve low misidentification rates and stronger cross-dataset generalization compared to existing datasets. GreatSplicing is now publicly available for research purposes at the following link.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection