RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network
Minchul Shin, Yoonjae Cho, Byungsoo Ko, Geonmo Gu

TL;DR
This paper introduces RTIC, a novel graph convolutional network-based architecture for image-text composition, achieving state-of-the-art retrieval performance by effectively encoding modifications and differences between images conditioned on text.
Contribution
The paper proposes a new architecture and a joint training technique for image-text composition that improves retrieval accuracy and is applicable to existing methods in a plug-and-play manner.
Findings
Achieved state-of-the-art scores on multiple benchmarks.
Proposed a unified training environment for fair comparison.
Demonstrated the effectiveness of the graph convolutional network-based approach.
Abstract
In this paper, we study the compositional learning of images and texts for image retrieval. The query is given in the form of an image and text that describes the desired modifications to the image; the goal is to retrieve the target image that satisfies the given modifications and resembles the query by composing information in both the text and image modalities. To remedy this, we propose a novel architecture designed for the image-text composition task and show that the proposed structure can effectively encode the differences between the source and target images conditioned on the text. Furthermore, we introduce a new joint training technique based on the graph convolutional network that is generally applicable for any existing composition methods in a plug-and-play manner. We found that the proposed technique consistently improves performance and achieves state-of-the-art scores on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
