Video Inpainting Localization with Contrastive Learning
Zijie Lou, Gang Cao, Man Lin

TL;DR
This paper introduces ViLocal, a contrastive learning-based method for detecting inpainted regions in videos by analyzing noise residuals, achieving superior localization accuracy over existing techniques.
Contribution
The paper presents a novel contrastive learning approach with a 3D encoder and a new dataset for effective video inpainting localization.
Findings
ViLocal outperforms state-of-the-art methods in localization accuracy.
A new dataset with 2500 videos and pixel-level annotations is introduced.
Contrastive learning enhances discriminative forensic features.
Abstract
Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Face recognition and analysis
MethodsInpainting · Contrastive Learning · Convolution
