UVL2: A Unified Framework for Video Tampering Localization
Pengfei Pei

TL;DR
This paper introduces UVL2, a comprehensive video tampering localization framework that leverages specialized feature extraction modules and a two-stage CNN-Transformer approach to enhance detection accuracy and robustness against various forgeries.
Contribution
The paper presents a novel unified framework combining multiple feature extraction modules with a CNN-Transformer architecture for improved video tampering localization.
Findings
Significantly outperforms existing methods in detection accuracy.
Demonstrates robustness across different types of video forgeries.
Effectively captures generalized forgery traces.
Abstract
With the advancement of deep learning-driven video editing technology, security risks have emerged. Malicious video tampering can lead to public misunderstanding, property losses, and legal disputes. Currently, detection methods are mostly limited to specific datasets, with limited detection performance for unknown forgeries, and lack of robustness for processed data. This paper proposes an effective video tampering localization network that significantly improves the detection performance of video inpainting and splicing by extracting more generalized features of forgery traces. Considering the inherent differences between tampered videos and original videos, such as edge artifacts, pixel distribution, texture features, and compress information, we have specifically designed four modules to independently extract these features. Furthermore, to seamlessly integrate these features, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization
