Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla
Ariful Islam, Tanvir Mahmud, Md Rifat Hossen

TL;DR
This paper introduces a novel multimodal fusion framework using transformer models for author intent classification in low-resource Bangla social media content, significantly improving accuracy over previous methods.
Contribution
The study presents a new intermediate fusion strategy with transformer models that outperforms existing approaches in Bangla multimodal intent classification.
Findings
Achieved 84.11% macro-F1 score, setting a new state-of-the-art.
Intermediate fusion with mBERT and Swin Transformer yields best results.
Integrating visual context enhances intent classification accuracy.
Abstract
The expansion of the Internet and social networks has led to an explosion of user-generated content. Author intent understanding plays a crucial role in interpreting social media content. This paper addresses author intent classification in Bangla social media posts by leveraging both textual and visual data. Recognizing limitations in previous unimodal approaches, we systematically benchmark transformer-based language models (mBERT, DistilBERT, XLM-RoBERTa) and vision architectures (ViT, Swin, SwiftFormer, ResNet, DenseNet, MobileNet), utilizing the Uddessho dataset of 3,048 posts spanning six practical intent categories. We introduce a novel intermediate fusion strategy that significantly outperforms early and late fusion on this task. Experimental results show that intermediate fusion, particularly with mBERT and Swin Transformer, achieves 84.11% macro-F1 score, establishing a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
