Multi-View Product Image Search Using Deep ConvNets Representations
Muhammet Bastan, Ozgur Yilmaz

TL;DR
This paper demonstrates that multi-view product image search using deep ConvNets significantly outperforms single view methods and classical approaches, highlighting the importance of multi-view data and pre-training for cluttered images.
Contribution
The study evaluates deep ConvNets for multi-view product image retrieval, showing their superiority over traditional methods and emphasizing the need for pre-training on cluttered datasets.
Findings
Multi-view queries with ConvNets outperform single view queries.
ConvNets outperform bag-of-visual-words in product image search.
Pre-training on cluttered datasets improves performance on mobile phone images.
Abstract
Multi-view product image queries can improve retrieval performance over single view queries significantly. In this paper, we investigated the performance of deep convolutional neural networks (ConvNets) on multi-view product image search. First, we trained a VGG-like network to learn deep ConvNets representations of product images. Then, we computed the deep ConvNets representations of database and query images and performed single view queries, and multi-view queries using several early and late fusion approaches. We performed extensive experiments on the publicly available Multi-View Object Image Dataset (MVOD 5K) with both clean background queries from the Internet and cluttered background queries from a mobile phone. We compared the performance of ConvNets to the classical bag-of-visual-words (BoWs). We concluded that (1) multi-view queries with deep ConvNets representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
