Multilingual Disinformation Detection for Digital Advertising
Zofia Trstanova, Nadir El Manouzi, Maryline Chen, Andre L. V. da, Cunha, Sergei Ivanov

TL;DR
This paper introduces a multilingual machine learning system that detects disinformation websites in digital advertising, enabling proactive removal of malicious publishers to protect online integrity.
Contribution
It presents the first multilingual approach to identify disinformation publishers in digital advertising using text embeddings and a two-step classification process.
Findings
Effective detection of disinformation websites across multiple languages.
Creates a shortlist for human review to improve accuracy.
Empowers proactive content moderation in digital advertising.
Abstract
In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Media Influence and Politics
