Distinguishing Commercial from Editorial Content in News
Timo Kats, Peter van der Putten, Jasper Schelling

TL;DR
This paper develops a machine learning approach to distinguish commercial advertorials from editorial news articles by analyzing textual features, achieving over 90% accuracy, and providing insights into language differences through various analytical methods.
Contribution
Introduces a machine learning model and lexicon for differentiating advertorials from news articles, with comprehensive analysis of language features and corpus structure.
Findings
Achieved over 90% classification accuracy.
Identified key linguistic differences between advertorials and news.
Provided visualizations revealing language and structural distinctions.
Abstract
How can we distinguish commercial from editorial content in news, or more specifically, differentiate between advertorials and regular news articles? An advertorial is a commercial message written and formatted as an article, making it harder for readers to recognize these as advertising, despite the use of disclaimers. In our research we aim to differentiate the two using a machine learning model, and a lexicon derived from it. This was accomplished by scraping 1.000 articles and 1.000 advertorials from four different Dutch news sources and classifying these based on textual features. With this setup our most successful machine learning model had an accuracy of just over . To generate additional insights into differences between news and advertorial language, we also analyzed model coefficients and explored the corpus through co-occurrence networks and t-SNE graphs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Authorship Attribution and Profiling · Advanced Text Analysis Techniques
