# Comparison of Mask-R-CNN and Thresholding-Based Segmentation for High-Throughput Phenotyping of Walnut Kernel Color

**Authors:** Steven H. Lee, Sean McDowell, Charles Leslie, Kristina McCreery, Mason Earles, Patrick J. Brown

PMC · DOI: 10.3390/plants14213335 · 2025-10-31

## TL;DR

This paper compares two image analysis methods for measuring walnut kernel color, finding that machine learning offers consistent results across years and adapts well to imperfect images.

## Contribution

The study introduces a robust CNN-based segmentation method for walnut kernel color phenotyping that requires minimal manual adjustments.

## Key findings

- Quantitative data from thresholding and CNN methods were highly correlated for lightness (r2 = 0.997) and size (r2 = 0.984).
- The CNN method was robust after training on only 13 images, unlike thresholding which required manual adjustments.
- Human scoring methods were not highly correlated with image analysis methods or with each other.

## Abstract

High-throughput phenotyping has become essential for plant breeding programs, replacing traditional methods that rely on subjective scales influenced by human judgment. Machine learning (ML) computer vision systems have successfully used convolutional neural networks (CNNs) for image segmentation, providing greater flexibility than thresholding methods that may require carefully staged images. This study compares two quantitative image analysis methods, rule-based thresholding using the magick package in R and an instance-segmentation pipeline based on the widely used Mask-R-CNN architecture, and then compares the output of each to two different sets of human evaluations. Walnuts were collected over three years from over 3000 individual trees maintained by the UC Davis walnut breeding program. The resulting 90,961 kernels were placed into 100-cell trays and imaged using a 20-megapixel Basler camera with a Sony IMX183 sensor. Quantitative data from both image analysis methods were highly correlated for both lightness (L*; r2 = 0.997) and size (r2 = 0.984). The thresholding method required many manual adjustments to account for minor discrepancies in staging, while the CNN method was robust after a rapid initial training on only 13 images. The two human scoring methods were not highly correlated with the image analysis methods or with each other. Pixel classification provides data similar to human color assessments but offers greater consistency across different years. The thresholding approach offers flexibility and has been applied to other color-based phenotyping tasks, while the CNN approach can be adapted to images that are not perfectly staged and be retrained to quantify more subtle kernel characteristics such as spotting and shrivel.

## Full-text entities

- **Species:** Juglans (walnuts, genus) [taxon 16718], Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12610562/full.md

---
Source: https://tomesphere.com/paper/PMC12610562