Privacy Preserving Visual Question Answering

Cristian-Paul Bara; Qing Ping; Abhinav Mathur; Govind Thattai; Rohith; MV; Gaurav S. Sukhatme

arXiv:2202.07712·cs.CV·February 17, 2022

Privacy Preserving Visual Question Answering

Cristian-Paul Bara, Qing Ping, Abhinav Mathur, Govind Thattai, Rohith, MV, Gaurav S. Sukhatme

PDF

Open Access

TL;DR

This paper presents a privacy-preserving approach for visual question answering that uses a compact, symbolic scene representation to protect image privacy while maintaining effective VQA performance.

Contribution

It introduces a hybrid method combining a small vision model with symbolic scene representation, significantly reducing model size and enhancing privacy in VQA tasks.

Findings

01

Model is over 25 times smaller than SOTA vision models

02

Symbolic representation prevents image recovery, ensuring privacy

03

Detailed error analysis and trade-off discussion provided

Abstract

We introduce a novel privacy-preserving methodology for performing Visual Question Answering on the edge. Our method constructs a symbolic representation of the visual scene, using a low-complexity computer vision model that jointly predicts classes, attributes and predicates. This symbolic representation is non-differentiable, which means it cannot be used to recover the original image, thereby keeping the original image private. Our proposed hybrid solution uses a vision model which is more than 25 times smaller than the current state-of-the-art (SOTA) vision models, and 100 times smaller than end-to-end SOTA VQA models. We report detailed error analysis and discuss the trade-offs of using a distilled vision model and a symbolic representation of the visual scene.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition