M2FN: Multi-step Modality Fusion for Advertisement Image Assessment

Kyung-Wha Park (1); Jung-Woo Ha (2); JungHoon Lee (3); Sunyoung Kwon; (4); Kyung-Min Kim (2); Byoung-Tak Zhang (1; 5; 6) ((1); Interdisciplinary Program in Neuroscience; Seoul National University.; (2); NAVER AI LAB; NAVER CLOVA.; (3) Statistics; Actuarial Science; Soongsil; University.; (4) School of Biomedical Convergence Engineering; Pusan National; University.; (5) Department of Computer Science; Engineering; Seoul; National University.; (6) Surromind Robotics.)

arXiv:2102.00441·cs.CV·February 10, 2021

M2FN: Multi-step Modality Fusion for Advertisement Image Assessment

Kyung-Wha Park (1), Jung-Woo Ha (2), JungHoon Lee (3), Sunyoung Kwon, (4), Kyung-Min Kim (2), Byoung-Tak Zhang (1, 5, 6) ((1), Interdisciplinary Program in Neuroscience, Seoul National University., (2), NAVER AI LAB, NAVER CLOVA., (3) Statistics, Actuarial Science, Soongsil

PDF

TL;DR

This paper introduces M2FN, a multi-step neural network that effectively fuses auxiliary image attributes, including embedded text, to improve advertisement image preference prediction, achieving state-of-the-art results on real-world datasets.

Contribution

The paper proposes a novel multi-step modality fusion network (M2FN) that leverages auxiliary attributes for better ad image assessment, addressing limitations of prior deep learning approaches.

Findings

01

M2FN outperforms existing methods in preference prediction accuracy.

02

Utilizing auxiliary attributes significantly improves ad image assessment.

03

State-of-the-art performance achieved on real-world ad datasets.

Abstract

Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry. Although recent studies have attempted to use deep neural networks for this purpose, these studies have not utilized image-related auxiliary attributes, which include embedded text frequently found in ad images. We, therefore, investigated the influence of these attributes on ad image preferences. First, we analyzed large-scale real-world ad log data and, based on our findings, proposed a novel multi-step modality fusion network (M2FN) that determines advertising images likely to appeal to user preferences. Our method utilizes auxiliary attributes through multiple steps in the network, which include conditional batch normalization-based low-level fusion and attention-based high-level fusion. We verified M2FN on the AVA dataset, which is widely used for aesthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.