Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu, Hanzhuo Tan, Jing Li, Piji Li

TL;DR
This paper introduces the concept of cross-modality discourse in social media, analyzing how images and texts combine to form coherent meanings, supported by a new dataset and a multimedia encoder achieving state-of-the-art results.
Contribution
It proposes a novel framework for understanding multimedia discourse, introduces a new annotated dataset, and develops an encoder that effectively models image-text interactions.
Findings
Multimedia encoder achieves state-of-the-art results.
New dataset with 16K annotated multimedia tweets.
Cross-modality discourse provides insights into human cognition.
Abstract
The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels -- entity-level insertion, projection and concretization and scene-level restatement and extension -- are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing 16K multimedia tweets with manually annotated discourse labels. The experimental results show that the multimedia encoder based on multi-head attention with captions is able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Translation Studies and Practices · Language, Metaphor, and Cognition
MethodsSoftmax · Linear Layer
