Utilizing a Transparency-driven Environment toward Trusted Automatic   Genre Classification: A Case Study in Journalism History

Aysenur Bilgin (1); Laura Hollink (1); Jacco van Ossenbruggen (1),; Erik Tjong Kim Sang (2); Kim Smeenk (3); Frank Harbers (3); Marcel Broersma; (3) ((1) CWI; (2) Netherlands eScience Center; (3) University of Groningen)

arXiv:1810.00968·cs.CL·October 3, 2018

Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism History

Aysenur Bilgin (1), Laura Hollink (1), Jacco van Ossenbruggen (1),, Erik Tjong Kim Sang (2), Kim Smeenk (3), Frank Harbers (3), Marcel Broersma, (3) ((1) CWI, (2) Netherlands eScience Center, (3) University of Groningen)

PDF

Open Access 1 Repo

TL;DR

This paper explores how transparency in machine learning models can improve trust and understanding among journalism historians in automatic newspaper genre classification, emphasizing practical impact analysis and environment development.

Contribution

It introduces a transparency-driven environment that enables non-experts to understand and evaluate machine learning models in journalism history research.

Findings

01

Historians' understanding of models increased over time.

02

Transparency aids in responsible machine learning usage.

03

Environment facilitates non-experts' engagement with ML models.

Abstract

With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

newsgac/platform
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling