Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism History
Aysenur Bilgin (1), Laura Hollink (1), Jacco van Ossenbruggen (1),, Erik Tjong Kim Sang (2), Kim Smeenk (3), Frank Harbers (3), Marcel Broersma, (3) ((1) CWI, (2) Netherlands eScience Center, (3) University of Groningen)

TL;DR
This paper explores how transparency in machine learning models can improve trust and understanding among journalism historians in automatic newspaper genre classification, emphasizing practical impact analysis and environment development.
Contribution
It introduces a transparency-driven environment that enables non-experts to understand and evaluate machine learning models in journalism history research.
Findings
Historians' understanding of models increased over time.
Transparency aids in responsible machine learning usage.
Environment facilitates non-experts' engagement with ML models.
Abstract
With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling
