Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest
Ramtin Davoudi, Kartik Thakkar, Nazanin Donyapour, Tyler Derr, Hamid Karimi

TL;DR
This paper comprehensively evaluates large language models across social media analytics tasks like authorship verification, post generation, and user attribute inference on Twitter data, providing benchmarks and insights.
Contribution
It introduces a systematic evaluation framework for LLMs in social media analytics, including new sampling strategies, user studies, and standardized benchmarks.
Findings
LLMs show strong performance in authorship verification and attribute inference.
Generated posts are perceived as authentic by users in user studies.
Benchmark results highlight strengths and limitations of current LLMs in social media tasks.
Abstract
In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user and post selection strategies and evaluate generalization on newly collected tweets from January 2024 onward to mitigate "seen-data" bias. For post generation, we assess the ability of LLMs to produce authentic, user-like content using comprehensive evaluation metrics. Bridging Tasks I and II, we conduct a user study to measure real users' perceptions of LLM-generated posts conditioned on their own writing. For attribute inference, we annotate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
