Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla

Ariful Islam; Tanvir Mahmud; Md Rifat Hossen

arXiv:2511.23287·cs.LG·December 1, 2025

Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla

Ariful Islam, Tanvir Mahmud, Md Rifat Hossen

PDF

Open Access

TL;DR

This paper introduces a novel multimodal fusion framework using transformer models for author intent classification in low-resource Bangla social media content, significantly improving accuracy over previous methods.

Contribution

The study presents a new intermediate fusion strategy with transformer models that outperforms existing approaches in Bangla multimodal intent classification.

Findings

01

Achieved 84.11% macro-F1 score, setting a new state-of-the-art.

02

Intermediate fusion with mBERT and Swin Transformer yields best results.

03

Integrating visual context enhances intent classification accuracy.

Abstract

The expansion of the Internet and social networks has led to an explosion of user-generated content. Author intent understanding plays a crucial role in interpreting social media content. This paper addresses author intent classification in Bangla social media posts by leveraging both textual and visual data. Recognizing limitations in previous unimodal approaches, we systematically benchmark transformer-based language models (mBERT, DistilBERT, XLM-RoBERTa) and vision architectures (ViT, Swin, SwiftFormer, ResNet, DenseNet, MobileNet), utilizing the Uddessho dataset of 3,048 posts spanning six practical intent categories. We introduce a novel intermediate fusion strategy that significantly outperforms early and late fusion on this task. Experimental results show that intermediate fusion, particularly with mBERT and Swin Transformer, achieves 84.11% macro-F1 score, establishing a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining