PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis
Bin Wen, Tien-Ping Tan

TL;DR
PGF-Net is a novel deep learning framework that enables efficient, interpretable, and state-of-the-art multimodal sentiment analysis by dynamically fusing audio, visual, and textual data with parameter-efficient training.
Contribution
The paper introduces a progressive intra-layer fusion, adaptive gated arbitration, and hybrid PEFT strategies, advancing multimodal sentiment analysis with deep, dynamic, and lightweight models.
Findings
Achieves state-of-the-art MAE of 0.691 and F1-Score of 86.9% on MOSI dataset.
Uses only 3.09 million trainable parameters, demonstrating high efficiency.
Outperforms existing models in balancing performance and computational cost.
Abstract
We introduce PGF-Net (Progressive Gated-Fusion Network), a novel deep learning framework designed for efficient and interpretable multimodal sentiment analysis. Our framework incorporates three primary innovations. Firstly, we propose a Progressive Intra-Layer Fusion paradigm, where a Cross-Attention mechanism empowers the textual representation to dynamically query and integrate non-linguistic features from audio and visual streams within the deep layers of a Transformer encoder. This enables a deeper, context-dependent fusion process. Secondly, the model incorporates an Adaptive Gated Arbitration mechanism, which acts as a dynamic controller to balance the original linguistic information against the newly fused multimodal context, ensuring stable and meaningful integration while preventing noise from overwhelming the signal. Lastly, a hybrid Parameter-Efficient Fine-Tuning (PEFT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
