Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Ran Chen, Hanli Wang, Lei Wang, Sam Kwong

TL;DR
This paper introduces a two-stream hierarchical similarity reasoning network that enhances image-text matching by capturing multi-level hierarchical similarities and employing dual-direction similarity computation, leading to improved performance.
Contribution
It proposes a novel end-to-end framework combining hierarchical similarity reasoning with a two-stream architecture for more effective image-text matching.
Findings
Outperforms state-of-the-art on MSCOCO and Flickr30K datasets.
Effectively captures multi-level hierarchical similarities.
Demonstrates the benefit of dual-direction similarity computation.
Abstract
Reasoning-based approaches have demonstrated their powerful ability for the task of image-text matching. In this work, two issues are addressed for image-text matching. First, for reasoning processing, conventional approaches have no ability to find and use multi-level hierarchical similarity information. To solve this problem, a hierarchical similarity reasoning module is proposed to automatically extract context information, which is then co-exploited with local interaction information for efficient reasoning. Second, previous approaches only consider learning single-stream similarity alignment (i.e., image-to-text level or text-to-image level), which is inadequate to fully use similarity information for image-text matching. To address this issue, a two-stream architecture is developed to decompose image-text matching into image-to-text level and text-to-image level similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
