HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
Manh-Duy Nguyen, Binh T. Nguyen, Cathal Gurrin

TL;DR
HADA is a compact graph-based framework that enhances image-text retrieval by combining pretrained models through a graph neural network, achieving improved performance with minimal training resources.
Contribution
It introduces a novel graph structure to fuse pretrained models for image-text retrieval, reducing training complexity and resource requirements.
Findings
Increases baseline performance by over 3.6% on Flickr30k.
Requires only 1 GPU for training due to small parameter count.
Does not rely on external datasets for training.
Abstract
Many models have been proposed for vision and language tasks, especially the image-text retrieval task. All state-of-the-art (SOTA) models in this challenge contained hundreds of millions of parameters. They also were pretrained on a large external dataset that has been proven to make a big improvement in overall performance. It is not easy to propose a new model with a novel architecture and intensively train it on a massive dataset with many GPUs to surpass many SOTA models, which are already available to use on the Internet. In this paper, we proposed a compact graph-based framework, named HADA, which can combine pretrained models to produce a better result, rather than building from scratch. First, we created a graph structure in which the nodes were the features extracted from the pretrained models and the edges connecting them. The graph structure was employed to capture and fuse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsGraph Neural Network
