DEMO: A Statistical Perspective for Efficient Image-Text Matching
Fan Zhang, Xian-Sheng Hua, Chong Chen, Xiao Luo

TL;DR
This paper introduces DEMO, a novel unsupervised hashing method for image-text matching that uses distribution divergence and consistency learning to improve semantic structure accuracy and retrieval performance.
Contribution
DEMO employs a statistical approach with distribution divergence and collaborative consistency learning to enhance image-text matching accuracy.
Findings
Outperforms state-of-the-art methods on benchmark datasets
Robust semantic similarity structure construction
Effective self-supervised consistency enforcement
Abstract
Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding. Due to the capability to manage large-scale raw data, unsupervised hashing-based approaches have gained prominence recently. They typically construct a semantic similarity structure using the natural distance, which subsequently provides guidance to the model optimization process. However, the similarity structure could be biased at the boundaries of semantic distributions, causing error accumulation during sequential optimization. To tackle this, we introduce a novel hashing approach termed Distribution-based Structure Mining with Consistency Learning (DEMO) for efficient image-text matching. From a statistical view, DEMO characterizes each image using multiple augmented views, which are considered as samples drawn from its intrinsic semantic distribution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Natural Language Processing Techniques
