Beyond the Link: Assessing LLMs' ability to Classify Political Content across Global Media
Alejandro De La Fuente-Cuesta, Alberto Martinez-Serra, Nienke Visscher, Laia Castro, Ana S. Cardenal

TL;DR
This study evaluates large language models' ability to classify political content from URLs across multiple countries, finding they can be effective but exhibit biases such as overclassifying centrist news as political.
Contribution
It demonstrates that URL analysis with LLMs can approximate full-text classification and provides methodological insights for political science research.
Findings
URLs contain relevant political information
LLMs can classify political content effectively
Systematic bias towards overclassifying centrist news
Abstract
The use of large language models (LLMs) is becoming common in political science and digital media research. While LLMs have demonstrated ability in labelling tasks, their effectiveness to classify Political Content (PC) from URLs remains underexplored. This article evaluates whether LLMs can accurately distinguish PC from non-PC using both the text and the URLs of news articles across five countries (France, Germany, Spain, the UK, and the US) and their different languages. Using cutting-edge models, we benchmark their performance against human-coded data to assess whether URL-level analysis can approximate full-text analysis. Our findings show that URLs embed relevant information and can serve as a scalable, cost-effective alternative to discern PC. However, we also uncover systematic biases: LLMs seem to overclassify centrist news as political, leading to false positives that may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
