Applying recent advances in Visual Question Answering to Record Linkage
Marko Smilevski

TL;DR
This paper adapts recent Visual Question Answering techniques to multi-modal record linkage, proposing neural network models that fuse visual and textual data to improve duplicate detection accuracy.
Contribution
It introduces two novel deep learning fusion modules inspired by VQA for multi-modal record linkage, with experimental validation on a real dataset.
Findings
Recurrent Neural Network + CNN fusion outperforms simple feature-based models.
Longer advertisements (>40 words) are more likely to be misclassified as similar.
Further research needed on Stacked Attention Network's impact on visual data integration.
Abstract
Multi-modal Record Linkage is the process of matching multi-modal records from multiple sources that represent the same entity. This field has not been explored in research and we propose two solutions based on Deep Learning architectures that are inspired by recent work in Visual Question Answering. The neural networks we propose use two different fusion modules, the Recurrent Neural Network + Convolutional Neural Network fusion module and the Stacked Attention Network fusion module, that jointly combine the visual and the textual data of the records. The output of these fusion models is the input of a Siamese Neural Network that computes the similarity of the records. Using data from the Avito Duplicate Advertisements Detection dataset, we train these solutions and from the experiments, we concluded that the Recurrent Neural Network + Convolutional Neural Network fusion module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
