Automatic Creative Selection with Cross-Modal Matching
Alex Kim, Jia Huang, Rob Monarch, Jerry Kwac, Anikesh Kamath,, Parmeshwar Khurd, Kailash Thiyagarajan, Goodman Gu

TL;DR
This paper introduces a novel method for matching app images to search terms using a fine-tuned LXMERT model, significantly improving accuracy over existing models like CLIP and Transformer-ResNet baselines.
Contribution
The work presents a new approach that fine-tunes a pre-trained LXMERT model for cross-modal matching of images and text, outperforming previous models in app advertising relevance tasks.
Findings
Achieves 0.96 AUC on advertiser data, outperforming baselines by 8-14%.
Achieves 0.95 AUC on human ratings, outperforming baselines by 16-17%.
Significantly improves image-text matching accuracy in app advertising.
Abstract
Application developers advertise their Apps by creating product pages with App images, and bidding on search terms. It is then crucial for App images to be highly relevant with the search terms. Solutions to this problem require an image-text matching model to predict the quality of the match between the chosen image and the search terms. In this work, we present a novel approach to matching an App image to search terms based on fine-tuning a pre-trained LXMERT model. We show that compared to the CLIP model and a baseline using a Transformer model for search terms, and a ResNet model for images, we significantly improve the matching accuracy. We evaluate our approach using two sets of labels: advertiser associated (image, search term) pairs for a given application, and human ratings for the relevance between (image, search term) pairs. Our approach achieves 0.96 AUC score for advertiser…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Human Motion and Animation · Video Analysis and Summarization
