Sensitive Image Classification by Vision Transformers
Hanxian He, Campbell Wilson, Thanh Thi Nguyen, Janis Dalins

TL;DR
This paper demonstrates that vision transformer models outperform traditional CNNs and other methods in classifying sensitive images, such as pornographic content, by effectively capturing global interactions and reducing ambiguity.
Contribution
The study introduces the application of vision transformers to sensitive image classification, showing their superiority over conventional models and established methods in this domain.
Findings
Vision transformers outperform ResNet models in classification accuracy.
They effectively reduce ambiguity in attention maps for sensitive images.
Transformers surpass existing CNN-based methods like Bumble and attention-based CNNs.
Abstract
When it comes to classifying child sexual abuse images, managing similar inter-class correlations and diverse intra-class correlations poses a significant challenge. Vision transformer models, unlike conventional deep convolutional network models, leverage a self-attention mechanism to capture global interactions among contextual local elements. This allows them to navigate through image patches effectively, avoiding incorrect correlations and reducing ambiguity in attention maps, thus proving their efficacy in computer vision tasks. Rather than directly analyzing child sexual abuse data, we constructed two datasets: one comprising clean and pornographic images and another with three classes, which additionally include images indicative of pornography, sourced from Reddit and Google Open Images data. In our experiments, we also employ an adult content image benchmark dataset. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsAttention Is All You Need · Average Pooling · Residual Connection · Layer Normalization · Global Average Pooling · Linear Layer · Kaiming Initialization · Softmax · Max Pooling · Dense Connections
