Uncovering bias in the PlantVillage dataset

Mehmet Alican Noyan

arXiv:2206.04374·cs.CV·June 10, 2022·22 cites

Uncovering bias in the PlantVillage dataset

Mehmet Alican Noyan

PDF

Open Access 1 Repo

TL;DR

This paper reveals that the PlantVillage dataset contains biases that models can exploit, leading to misleadingly high accuracy, and discusses potential methods to address this issue.

Contribution

The study uncovers bias in the PlantVillage dataset and demonstrates how models can achieve high accuracy using minimal background information, highlighting the need for dataset quality assessment.

Findings

01

Model trained on 8 pixels achieved 49% accuracy

02

Bias in dataset allows models to predict labels with minimal information

03

Discussion of approaches to mitigate dataset bias

Abstract

We report our investigation on the use of the popular PlantVillage dataset for training deep learning based plant disease detection models. We trained a machine learning model using only 8 pixels from the PlantVillage image backgrounds. The model achieved 49.0% accuracy on the held-out test set, well above the random guessing accuracy of 2.6%. This result indicates that the PlantVillage dataset contains noise correlated with the labels and deep learning models can easily exploit this bias to make predictions. Possible approaches to alleviate this problem are discussed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ipsumio/plantvillage_bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI

MethodsTest