# Feature Analysis for Assessing the Quality of Wikipedia Articles through   Supervised Classification

**Authors:** Elias Bassani, Marco Viviani

arXiv: 1812.02655 · 2018-12-10

## TL;DR

This paper investigates the automatic assessment of Wikipedia article quality using handcrafted features and supervised machine learning, aiming to improve content verification and reduce misinformation.

## Contribution

It introduces a comprehensive set of features for classifying Wikipedia articles by quality, expanding on prior work with detailed feature analysis and evaluation.

## Key findings

- Supervised classifiers achieved encouraging accuracy.
- A wider set of features improved classification performance.
- Results demonstrate the effectiveness of handcrafted features.

## Abstract

Nowadays, thanks to Web 2.0 technologies, people have the possibility to generate and spread contents on different social media in a very easy way. In this context, the evaluation of the quality of the information that is available online is becoming more and more a crucial issue. In fact, a constant flow of contents is generated every day by often unknown sources, which are not certified by traditional authoritative entities. This requires the development of appropriate methodologies that can evaluate in a systematic way these contents, based on `objective' aspects connected with them. This would help individuals, who nowadays tend to increasingly form their opinions based on what they read online and on social media, to come into contact with information that is actually useful and verified. Wikipedia is nowadays one of the biggest online resources on which users rely as a source of information. The amount of collaboratively generated content that is sent to the online encyclopedia every day can let to the possible creation of low-quality articles (and, consequently, misinformation) if not properly monitored and revised. For this reason, in this paper, the problem of automatically assessing the quality of Wikipedia articles is considered. In particular, the focus is on the analysis of hand-crafted features that can be employed by supervised machine learning techniques to perform the classification of Wikipedia articles on qualitative bases. With respect to prior literature, a wider set of characteristics connected to Wikipedia articles are taken into account and illustrated in detail. Evaluations are performed by considering a labeled dataset provided in a prior work, and different supervised machine learning algorithms, which produced encouraging results with respect to the considered features.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.02655/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1812.02655/full.md

---
Source: https://tomesphere.com/paper/1812.02655