Sensitive Image Classification by Vision Transformers

Hanxian He; Campbell Wilson; Thanh Thi Nguyen; Janis Dalins

arXiv:2412.16446·cs.CV·December 24, 2024

Sensitive Image Classification by Vision Transformers

Hanxian He, Campbell Wilson, Thanh Thi Nguyen, Janis Dalins

PDF

Open Access

TL;DR

This paper demonstrates that vision transformer models outperform traditional CNNs and other methods in classifying sensitive images, such as pornographic content, by effectively capturing global interactions and reducing ambiguity.

Contribution

The study introduces the application of vision transformers to sensitive image classification, showing their superiority over conventional models and established methods in this domain.

Findings

01

Vision transformers outperform ResNet models in classification accuracy.

02

They effectively reduce ambiguity in attention maps for sensitive images.

03

Transformers surpass existing CNN-based methods like Bumble and attention-based CNNs.

Abstract

When it comes to classifying child sexual abuse images, managing similar inter-class correlations and diverse intra-class correlations poses a significant challenge. Vision transformer models, unlike conventional deep convolutional network models, leverage a self-attention mechanism to capture global interactions among contextual local elements. This allows them to navigate through image patches effectively, avoiding incorrect correlations and reducing ambiguity in attention maps, thus proving their efficacy in computer vision tasks. Rather than directly analyzing child sexual abuse data, we constructed two datasets: one comprising clean and pornographic images and another with three classes, which additionally include images indicative of pornography, sourced from Reddit and Google Open Images data. In our experiments, we also employ an adult content image benchmark dataset. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Average Pooling · Residual Connection · Layer Normalization · Global Average Pooling · Linear Layer · Kaiming Initialization · Softmax · Max Pooling · Dense Connections