FITRep: Attention-Guided Item Representation via MLLMs
Guoxiao Zhang, Ao Li, Tan Qu, Qianlong Xie, Xingxing Wang

TL;DR
This paper introduces FITRep, a novel attention-guided framework utilizing MLLMs for fine-grained item deduplication by preserving structural relationships, leading to improved online advertising performance.
Contribution
It presents the first white-box, attention-guided item representation method that leverages hierarchical semantic extraction and structure-preserving compression for deduplication.
Findings
Achieves +3.60% CTR in online tests.
Achieves +4.25% CPM in online tests.
Demonstrates effectiveness and real-world impact.
Abstract
Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal embedding, existing methods treat representations as black boxes, ignoring structural relationships (e.g., primary vs. auxiliary elements), leading to local structural collapse problem. To address this, inspired by Feature Integration Theory (FIT), we propose FITRep, the first attention-guided, white-box item representation framework for fine-grained item deduplication. FITRep consists of: (1) Concept Hierarchical Information Extraction (CHIE), using MLLMs to extract hierarchical semantic concepts; (2) Structure-Preserving Dimensionality Reduction (SPDR), an adaptive UMAP-based method for efficient information compression; and (3) FAISS-Based Clustering (FBC), a FAISS-based clustering that assigns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Advanced Graph Neural Networks · Topic Modeling
