Can We Get Rid of Handcrafted Feature Extractors? SparseViT: Nonsemantics-Centered, Parameter-Efficient Image Manipulation Localization through Spare-Coding Transformer
Lei Su, Xiaochen Ma, Xuekang Zhu, Chaoqun Niu, Zeyu Lei, Ji-Zhe Zhou

TL;DR
SparseViT introduces a sparse, parameter-efficient transformer that adaptively extracts non-semantic features for image manipulation localization, outperforming traditional handcrafted methods in generalization and efficiency.
Contribution
The paper proposes SparseViT, a novel sparse self-attention transformer that eliminates handcrafted feature extractors and enhances non-semantic feature extraction for IML.
Findings
Outperforms existing models in generalization across datasets.
Reduces model size and FLOPs by up to 80%.
Achieves superior accuracy without handcrafted features.
Abstract
Non-semantic features or semantic-agnostic features, which are irrelevant to image context but sensitive to image manipulations, are recognized as evidential to Image Manipulation Localization (IML). Since manual labels are impossible, existing works rely on handcrafted methods to extract non-semantic features. Handcrafted non-semantic features jeopardize IML model's generalization ability in unseen or complex scenarios. Therefore, for IML, the elephant in the room is: How to adaptively extract non-semantic features? Non-semantic features are context-irrelevant and manipulation-sensitive. That is, within an image, they are consistent across patches unless manipulation occurs. Then, spare and discrete interactions among image patches are sufficient for extracting non-semantic features. However, image semantics vary drastically on different patches, requiring dense and continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Processing Techniques and Applications · Digital Media Forensic Detection · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Vision Transformer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection
