Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts
Christopher Bagdon, Aidan Combs, Carina Silberer, Roman Klinger

TL;DR
This paper compares emotion-labeled multimodal social media posts created by study participants versus genuine posts, analyzing differences and implications for model training and generalization.
Contribution
It provides a detailed comparison of study-created and genuine posts, highlighting differences and assessing their impact on emotion modeling.
Findings
Study-created posts are longer and focus more on text than images.
Participants who donate and create differ demographically.
Models trained on study-created data generalize to genuine data.
Abstract
Accurate modeling of subjective phenomena such as emotion expression requires data annotated with authors' intentions. Commonly such data is collected by asking study participants to donate and label genuine content produced in the real world, or create content fitting particular labels during the study. Asking participants to create content is often simpler to implement and presents fewer risks to participant privacy than data donation. However, it is unclear if and how study-created content may differ from genuine content, and how differences may impact models. We collect study-created and genuine multimodal social media posts labeled for emotion and compare them on several dimensions, including model performance. We find that compared to genuine posts, study-created posts are longer, rely more on their text and less on their images for emotion expression, and focus more on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDigital Communication and Language · Discourse Analysis in Language Studies
MethodsFocus
