Characterizing the Language of Online Communities and its Relation to Community Reception
Trang Tran, Mari Ostendorf

TL;DR
This paper analyzes how language style and topics characterize online communities, finding style better indicates community identity and correlates with positive community reception, unlike topics.
Contribution
It introduces a hybrid language model for style and uses LDA for topics, revealing style's stronger link to community identity and reception.
Findings
Style outperforms topic in identifying community identity
Positive correlation between style similarity and community reception
Topic similarity does not significantly relate to community reception
Abstract
This work investigates style and topic aspects of language in online communities: looking at both utility as an identifier of the community and correlation with community reception of content. Style is characterized using a hybrid word and part-of-speech tag n-gram language model, while topic is represented using Latent Dirichlet Allocation. Experiments with several Reddit forums show that style is a better indicator of community identity than topic, even for communities organized around specific topics. Further, there is a positive correlation between the community reception to a contribution and the style similarity to that community, but not so for topic similarity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
