Towards a corpus for credibility assessment in software practitioner   blog articles

Ashley Williams; Matthew Shardlow; Austen Rainer

arXiv:2106.11159·cs.SE·June 22, 2021

Towards a corpus for credibility assessment in software practitioner blog articles

Ashley Williams, Matthew Shardlow, Austen Rainer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new annotated corpus of software practitioner blog articles, focusing on credibility criteria like argumentation and evidence, to support research in credibility assessment of grey literature.

Contribution

It creates and publicly shares a labeled corpus with high inter-annotator agreement, and demonstrates initial system performance for identifying claim sentences.

Findings

01

Achieved an F1 score of 0.64 with InferSent and SVM.

02

Corpus has high inter-annotator agreement of 0.82 (Fleiss' Kappa).

03

Preliminary results show promise for credibility detection in grey literature.

Abstract

Blogs are a source of grey literature which are widely adopted by software practitioners for disseminating opinion and experience. Analysing such articles can provide useful insights into the state-of-practice for software engineering research. However, there are challenges in identifying higher quality content from the large quantity of articles available. Credibility assessment can help in identifying quality content, though there is a lack of existing corpora. Credibility is typically measured through a series of conceptual criteria, with 'argumentation' and 'evidence' being two important criteria. We create a corpus labelled for argumentation and evidence that can aid the credibility community. The corpus consists of articles from the blog of a single software practitioner and is publicly available. Three annotators label the corpus with a series of conceptual credibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

serenpa/Blog-Credibility-Corpus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Misinformation and Its Impacts