Cross-Modal Alignment with Mixture Experts Neural Network for Intral-City Retail Recommendation
Po Li, Lei Li, Yan Fu, Jun Rong, Yu Zhang

TL;DR
This paper introduces CameNN, a multi-task neural network with mixture experts designed for intral-city retail recommendation, improving image-text alignment and conversion rate prediction during COVID-19-driven delivery demands.
Contribution
The paper presents a novel multi-task mixture of experts neural network architecture tailored for cross-modal alignment and retail recommendation, leveraging pre-trained models and expert gating mechanisms.
Findings
CameNN outperforms baseline models on real-world intra-city datasets.
Significant improvements in image and text representation quality.
Effective application of CameNN for CVR prediction in a leading Chinese intra-city platform.
Abstract
In this paper, we introduce Cross-modal Alignment with mixture experts Neural Network (CameNN) recommendation model for intral-city retail industry, which aims to provide fresh foods and groceries retailing within 5 hours delivery service arising for the outbreak of Coronavirus disease (COVID-19) pandemic around the world. We propose CameNN, which is a multi-task model with three tasks including Image to Text Alignment (ITA) task, Text to Image Alignment (TIA) task and CVR prediction task. We use pre-trained BERT to generate the text embedding and pre-trained InceptionV4 to generate image patch embedding (each image is split into small patches with the same pixels and treat each patch as an image token). Softmax gating networks follow to learn the weight of each transformer expert output and choose only a subset of experts conditioned on the input. Then transformer encoder is applied as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Recommender Systems and Techniques · Image Retrieval and Classification Techniques
MethodsLinear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Weight Decay · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Layer Normalization · WordPiece
