Mutual Query Network for Multi-Modal Product Image Segmentation
Yun Guo, Wei Feng, Zheng Zhang, Xiancong Ren, Yaoyu Li, Jingjing Lv,, Xin Zhu, Zhangang Lin, Jingping Shao

TL;DR
This paper introduces a mutual query network that leverages both visual and linguistic information from product titles and images to improve segmentation accuracy in e-commerce, addressing limitations of visual-only methods.
Contribution
It proposes a novel mutual query network with modules for aligning and correlating visual and linguistic data, and introduces a new large-scale Multi-Modal Product Segmentation dataset (MMPS).
Findings
Outperforms state-of-the-art methods on MMPS dataset.
Effectively filters irrelevant content using linguistic cues.
Enhances product segmentation accuracy in e-commerce applications.
Abstract
Product image segmentation is vital in e-commerce. Most existing methods extract the product image foreground only based on the visual modality, making it difficult to distinguish irrelevant products. As product titles contain abundant appearance information and provide complementary cues for product image segmentation, we propose a mutual query network to segment products based on both visual and linguistic modalities. First, we design a language query vision module to obtain the response of language description in image areas, thus aligning the visual and linguistic representations across modalities. Then, a vision query language module utilizes the correlation between visual and linguistic modalities to filter the product title and effectively suppress the content irrelevant to the vision in the title. To promote the research in this field, we also construct a Multi-Modal Product…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · Text and Document Classification Technologies
