Can We Get Rid of Handcrafted Feature Extractors? SparseViT:   Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization   through Spare-Coding Transformer

Lei Su; Xiaochen Ma; Xuekang Zhu; Chaoqun Niu; Zeyu Lei; Ji-Zhe Zhou

arXiv:2412.14598·cs.CV·December 24, 2024

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer

Lei Su, Xiaochen Ma, Xuekang Zhu, Chaoqun Niu, Zeyu Lei, Ji-Zhe Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

SparseViT introduces a sparse, parameter-efficient transformer that adaptively extracts non-semantic features for image manipulation localization, outperforming traditional handcrafted methods in generalization and efficiency.

Contribution

The paper proposes SparseViT, a novel sparse self-attention transformer that eliminates handcrafted feature extractors and enhances non-semantic feature extraction for IML.

Findings

01

Outperforms existing models in generalization across datasets.

02

Reduces model size and FLOPs by up to 80%.

03

Achieves superior accuracy without handcrafted features.

Abstract

Non-semantic features or semantic-agnostic features, which are irrelevant to image context but sensitive to image manipulations, are recognized as evidential to Image Manipulation Localization (IML). Since manual labels are impossible, existing works rely on handcrafted methods to extract non-semantic features. Handcrafted non-semantic features jeopardize IML model's generalization ability in unseen or complex scenarios. Therefore, for IML, the elephant in the room is: How to adaptively extract non-semantic features? Non-semantic features are context-irrelevant and manipulation-sensitive. That is, within an image, they are consistent across patches unless manipulation occurs. Then, spare and discrete interactions among image patches are sufficient for extracting non-semantic features. However, image semantics vary drastically on different patches, requiring dense and continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

scu-zjz/sparsevit
pytorchOfficial

Videos

Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization Through Spare-Coding Transformer· underline

Taxonomy

TopicsImage Processing Techniques and Applications · Digital Media Forensic Detection · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Vision Transformer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection