Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry
Xiaocong Du, Haoyu Pei, Haipeng Zhang

TL;DR
This paper introduces a multimodal framework that combines audio, visual, and textual features, enhanced by dialectal phonetics and large language model translation, to improve sentiment analysis of classical Chinese poetry.
Contribution
It presents a novel dialect-enhanced multimodal approach integrating audio, visual, and textual features with LLM translation for classical Chinese poetry sentiment analysis.
Findings
Achieved at least 2.51% accuracy improvement over state-of-the-art methods
Achieved at least 1.63% macro F1 score improvement
Open-sourced the code for further research
Abstract
Classical Chinese poetry is a vital and enduring part of Chinese literature, conveying profound emotional resonance. Existing studies analyze sentiment based on textual meanings, overlooking the unique rhythmic and visual features inherent in poetry,especially since it is often recited and accompanied by Chinese paintings. In this work, we propose a dialect-enhanced multimodal framework for classical Chinese poetry sentiment analysis. We extract sentence-level audio features from the poetry and incorporate audio from multiple dialects,which may retain regional ancient Chinese phonetic features, enriching the phonetic representation. Additionally, we generate sentence-level visual features, and the multimodal features are fused with textual features enhanced by LLM translation through multimodal contrastive representation learning. Our framework outperforms state-of-the-art methods on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Multimodal Machine Learning Applications
