Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis
Odysseas S. Chlapanis, Georgios Paraskevopoulos, Alexandros Potamianos

TL;DR
This paper introduces Adapted Multimodal BERT (AMB), a parameter-efficient architecture that fuses audio-visual data with text for sentiment analysis, outperforming state-of-the-art models while maintaining robustness and efficiency.
Contribution
The paper presents a novel BERT-based multimodal model using adapter modules and layer-wise fusion, enabling efficient training and improved performance in sentiment analysis.
Findings
AMB outperforms current state-of-the-art models on CMU-MOSEI.
The approach achieves a 3.4% reduction in error.
It improves 7-class classification accuracy by 2.1%.
Abstract
Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pre-trained language model parameters remain frozen, allowing for fast, parameter-efficient training. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise. Our experiments on sentiment analysis with CMU-MOSEI show that AMB outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Adam · Linear Warmup With Linear Decay
