ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and   Low-Rank Adaptation via Instruction-Tuned Large Language Model

Zezheng Qin

arXiv:2409.08543·cs.IR·September 16, 2024

ATFLRec: A Multimodal Recommender System with Audio-Text Fusion and Low-Rank Adaptation via Instruction-Tuned Large Language Model

Zezheng Qin

PDF

Open Access

TL;DR

This paper introduces ATFLRec, a multimodal recommender system that fuses audio and text data using instruction-tuned large language models with low-rank adaptation, improving recommendation accuracy and efficiency.

Contribution

The study proposes a novel multimodal recommendation framework integrating audio and text into LLMs with LoRA, addressing cold-start and computational challenges.

Findings

01

ATFLRec outperforms baseline models in AUC scores.

02

Separate LoRA modules for audio and text improve performance.

03

Modality fusion techniques and pooling methods significantly affect results.

Abstract

Recommender Systems (RS) play a pivotal role in boosting user satisfaction by providing personalized product suggestions in domains such as e-commerce and entertainment. This study examines the integration of multimodal data text and audio into large language models (LLMs) with the aim of enhancing recommendation performance. Traditional text and audio recommenders encounter limitations such as the cold-start problem, and recent advancements in LLMs, while promising, are computationally expensive. To address these issues, Low-Rank Adaptation (LoRA) is introduced, which enhances efficiency without compromising performance. The ATFLRec framework is proposed to integrate audio and text modalities into a multimodal recommendation system, utilizing various LoRA configurations and modality fusion techniques. Results indicate that ATFLRec outperforms baseline models, including traditional and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Topic Modeling