RTIC: Residual Learning for Text and Image Composition using Graph   Convolutional Network

Minchul Shin; Yoonjae Cho; Byungsoo Ko; Geonmo Gu

arXiv:2104.03015·cs.CV·October 27, 2021·23 cites

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

Minchul Shin, Yoonjae Cho, Byungsoo Ko, Geonmo Gu

PDF

Open Access 2 Repos

TL;DR

This paper introduces RTIC, a novel graph convolutional network-based architecture for image-text composition, achieving state-of-the-art retrieval performance by effectively encoding modifications and differences between images conditioned on text.

Contribution

The paper proposes a new architecture and a joint training technique for image-text composition that improves retrieval accuracy and is applicable to existing methods in a plug-and-play manner.

Findings

01

Achieved state-of-the-art scores on multiple benchmarks.

02

Proposed a unified training environment for fair comparison.

03

Demonstrated the effectiveness of the graph convolutional network-based approach.

Abstract

In this paper, we study the compositional learning of images and texts for image retrieval. The query is given in the form of an image and text that describes the desired modifications to the image; the goal is to retrieve the target image that satisfies the given modifications and resembles the query by composing information in both the text and image modalities. To remedy this, we propose a novel architecture designed for the image-text composition task and show that the proposed structure can effectively encode the differences between the source and target images conditioned on the text. Furthermore, we introduce a new joint training technique based on the graph convolutional network that is generally applicable for any existing composition methods in a plug-and-play manner. We found that the proposed technique consistently improves performance and achieves state-of-the-art scores on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques