Optimising the Input Image to Improve Visual Relationship Detection

Noel Mizzi; Adrian Muscat

arXiv:1903.11029·cs.CV·March 27, 2019·1 cites

Optimising the Input Image to Improve Visual Relationship Detection

Noel Mizzi, Adrian Muscat

PDF

Open Access

TL;DR

This paper investigates how different image preprocessing techniques affect visual relationship detection, finding that the Union-WB-and-B method significantly improves predicate prediction by enabling CNNs to better identify subjects and objects.

Contribution

The study introduces and evaluates alternative preprocessing methods, demonstrating that Union-WB-and-B enhances CNN performance in visual relationship detection.

Findings

01

Union-WB-and-B outperforms standard Union method

02

Preprocessing improves predicate prediction accuracy

03

CNNs can identify subjects and objects earlier in processing

Abstract

Visual Relationship Detection is defined as, given an image composed of a subject and an object, the correct relation is predicted. To improve the visual part of this difficult problem, ten preprocessing methods were tested to determine whether the widely used Union method yields the optimal results. Therefore, focusing solely on predicate prediction, no object detection and linguistic knowledge were used to prevent them from affecting the comparison results. Once fine-tuned, the Visual Geometry Group models were evaluated using Recall@1, per-predicate recall, activation maximisations, class activation maps, and error analysis. From this research it was found that using preprocessing methods such as the Union-Without-Background-and-with-Binary-mask (Union-WB-and-B) method yields significantly better results than the widely used Union method since, as designed, it enables the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods