FashionLOGO: Prompting Multimodal Large Language Models for Fashion Logo Embeddings
Zhen Wang, Da Li, Yulin Su, Min Yang, Minghui Qiu, Walton Wang

TL;DR
FashionLOGO leverages multimodal large language models to improve logo embeddings by integrating textual auxiliary information, significantly enhancing logo recognition and detection in e-commerce applications.
Contribution
This paper introduces FashionLOGO, a novel prompting approach for MLLMs to generate superior logo embeddings by combining visual and textual data, surpassing existing visual-only methods.
Findings
Achieves state-of-the-art performance on real-world logo datasets.
Generates more robust and generic logo embeddings.
Enhances logo recognition and detection accuracy.
Abstract
Logo embedding models convert the product logos in images into vectors, enabling their utilization for logo recognition and detection within e-commerce platforms. This facilitates the enforcement of intellectual property rights and enhances product search capabilities. However, current methods treat logo embedding as a purely visual problem. A noteworthy issue is that visual models capture features more than logos. Instead, we view this as a multimodal task, using text as auxiliary information to facilitate the visual model's understanding of the logo. The emerging Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in both visual and textual understanding. Inspired by this, we propose an approach, \textbf{FashionLOGO}, to explore how to prompt MLLMs to generate appropriate text for product images, which can help visual models achieve better logo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
