Movie Recommendation with Poster Attention via Multi-modal Transformer   Feature Fusion

Linhan Xia; Yicheng Yang; Ziou Chen; Zheng Yang; Shengxin Zhu

arXiv:2407.09157·cs.IR·July 15, 2024·1 cites

Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion

Linhan Xia, Yicheng Yang, Ziou Chen, Zheng Yang, Shengxin Zhu

PDF

Open Access

TL;DR

This paper introduces a multi-modal transformer-based movie recommendation system that fuses text and poster features using pre-trained models, achieving improved accuracy on standard benchmarks.

Contribution

It presents a novel multi-modal feature fusion approach combining BERT, ViT, and transformer architecture for enhanced movie recommendation accuracy.

Findings

01

Improved prediction accuracy on MovieLens datasets.

02

Effective multi-modal feature integration using pre-trained models.

03

Demonstrated potential for cross-modal recommendation applications.

Abstract

Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi-modal movie recommendation system by extract features of the well designed posters for each movie and the narrative text description of the movie. This system uses the BERT model to extract the information of text modality, the ViT model applied to extract the information of poster/image modality, and the Transformer architecture for feature fusion of all modalities to predict users' preference. The integration of pre-trained foundational models with some smaller data sets in downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Attention Dropout · Linear Warmup With Linear Decay · Adam · Dropout