After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion   Functions for Scene Graph Generation

Mohamed Karim Belaid

arXiv:2011.04779·cs.CV·November 11, 2020

After All, Only The Last Neuron Matters: Comparing Multi-modal Fusion Functions for Scene Graph Generation

Mohamed Karim Belaid

PDF

Open Access 1 Repo

TL;DR

This paper compares various fusion functions for the final module in Scene Graph Generation, revealing that the DIST function outperforms others in recall metrics and setting a new state-of-the-art.

Contribution

It introduces and evaluates new fusion functions, including an adapted DIST, for the scene graph generation task, improving performance benchmarks.

Findings

01

DIST outperforms SUM and GATE in recall @ K

02

Adding new fusion functions enhances scene graph generation performance

03

The adapted DIST becomes part of the state-of-the-art

Abstract

From object segmentation to word vector representations, Scene Graph Generation (SGG) became a complex task built upon numerous research results. In this paper, we focus on the last module of this model: the fusion function. The role of this latter is to combine three hidden states. We perform an ablation test in order to compare different implementations. First, we reproduce the state-of-the-art results using SUM, and GATE functions. Then we expand the original solution by adding more model-agnostic functions: an adapted version of DIST and a mixture between MFB and GATE. On the basis of the state-of-the-art configuration, DIST performed the best Recall @ K, which makes it now part of the state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Karim-53/SGG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques