# Image Privacy Prediction Using Deep Neural Networks

**Authors:** Ashwini Tonge, Cornelia Caragea

arXiv: 1903.03695 · 2019-03-12

## TL;DR

This paper investigates automatic image privacy prediction using deep neural network features and textual tags, demonstrating that combining visual and tag features improves privacy classification accuracy.

## Contribution

It introduces a comprehensive analysis of deep CNN features from multiple architectures for privacy prediction and explores combining visual and textual features for enhanced performance.

## Key findings

- ResNet features outperform other CNN architectures in privacy prediction.
- Combining visual features with user tags improves classification accuracy.
- Deep features and tags together outperform individual feature sets.

## Abstract

Images today are increasingly shared online on social networking sites such as Facebook, Flickr, Foursquare, and Instagram. Despite that current social networking sites allow users to change their privacy preferences, this is often a cumbersome task for the vast majority of users on the Web, who face difficulties in assigning and managing privacy settings. Thus, automatically predicting images' privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world.   In this paper, we explore learning models to automatically predict appropriate images' privacy as private or public using carefully identified image-specific features. We study deep visual semantic features that are derived from various layers of Convolutional Neural Networks (CNNs) as well as textual features such as user tags and deep tags generated from deep CNNs. Particularly, we extract deep (visual and tag) features from four pre-trained CNN architectures for object recognition, i.e., AlexNet, GoogLeNet, VGG-16, and ResNet, and compare their performance for image privacy prediction. Results of our experiments on a Flickr dataset of over thirty thousand images show that the learning models trained on features extracted from ResNet outperform the state-of-the-art models for image privacy prediction. We further investigate the combination of user tags and deep tags derived from CNN architectures using two settings: (1) SVM on the bag-of-tags features; and (2) text-based CNN. Our results show that even though the models trained on the visual features perform better than those trained on the tag features, the combination of deep visual features with image tags shows improvements in performance over the individual feature sets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.03695/full.md

## Figures

36 figures with captions in the complete paper: https://tomesphere.com/paper/1903.03695/full.md

## References

111 references — full list in the complete paper: https://tomesphere.com/paper/1903.03695/full.md

---
Source: https://tomesphere.com/paper/1903.03695