Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image   Classification and Retrieval

Andres Mafla; Sounak Dey; Ali Furkan Biten; Lluis Gomez and; Dimosthenis Karatzas

arXiv:2009.09809·cs.CV·September 22, 2020

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez and, Dimosthenis Karatzas

PDF

1 Repo

TL;DR

This paper introduces a multi-modal reasoning graph that combines visual and textual cues using graph convolutional networks to improve fine-grained image classification and retrieval.

Contribution

It proposes a novel multi-modal reasoning framework that integrates textual and visual features via GCNs for enhanced scene-text based image analysis.

Findings

01

Outperforms previous state-of-the-art in fine-grained classification.

02

Achieves superior results in image retrieval tasks.

03

Effectively leverages scene text for improved visual understanding.

Abstract

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text reading system. Then, we combine textual features with salient image regions to exploit the complementary information carried by the two sources. Specifically, we employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image. By obtaining an enhanced set of visual and textual features, the proposed model greatly outperforms the previous state-of-the-art in two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AndresPMD/GCN_classification
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.