OliVaR: Improving Olive Variety Recognition using Deep Neural Networks

Hristofor Miho; Giulio Pagnotta; Dorjan Hitaj; Fabio De Gaspari; Luigi; V. Mancini; Georgios Koubouris; Gianluca Godino; Mehmet Hakan; Concepcion; Mu\~noz Diez

arXiv:2303.00431·cs.CV·March 2, 2023

OliVaR: Improving Olive Variety Recognition using Deep Neural Networks

Hristofor Miho, Giulio Pagnotta, Dorjan Hitaj, Fabio De Gaspari, Luigi, V. Mancini, Georgios Koubouris, Gianluca Godino, Mehmet Hakan, Concepcion, Mu\~noz Diez

PDF

Open Access

TL;DR

OliVaR is a deep learning-based method that accurately classifies olive varieties from images of the endocarp, offering a cost-effective alternative to traditional morphological and genetic identification methods.

Contribution

This paper introduces OliVaR, a novel deep neural network approach utilizing a teacher-student architecture for olive variety recognition from images.

Findings

01

Achieved over 86% accuracy in classifying 131 olive varieties.

02

Constructed the largest olive variety image dataset to date.

03

Demonstrated the effectiveness of deep learning in agricultural varietal identification.

Abstract

The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table olive producers may label and sell a non-authentic product; consumers may be misled; and breeders may commit errors during targeted crossings between different varieties. To date, the standard for varietal identification and certification consists of two methods: morphological classification and genetic analysis. The morphological classification consists of the visual pairwise comparison of different organs of the olive tree, where the most important organ is considered to be the endocarp. In…

Equations4

θ = ar g θ \in Θ min i \sum l (f (x_{i}; θ), y_{i}),

θ = ar g θ \in Θ min i \sum l (f (x_{i}; θ), y_{i}),

θ = ar g θ \in Θ min i \sum l (f (x_{i}; θ), y_{i}) + Ω (θ),

θ = ar g θ \in Θ min i \sum l (f (x_{i}; θ), y_{i}) + Ω (θ),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIdentification and Quantification in Food · Edible Oils Quality and Analysis · Spectroscopy and Chemometric Analyses

MethodsTest

Full text

\credit

Conceptualization of this study, Methodology, Writing - Original draft preparation, Dataset Creation

\credit

Conceptualization of this study, Methodology, Writing - Original draft preparation, Software

\credit

Conceptualization of this study, Methodology, Writing - Original draft preparation

\credit

Conceptualization of this study, Methodology, Writing - Original draft preparation

\credit

Conceptualization of this study, Methodology, Writing - Original draft preparation

\credit

Methodology, Writing - Original draft preparation, Dataset Creation

\credit

Methodology, Dataset Creation

\credit

Methodology, Dataset Creation

\credit

Methodology, Writing - Original draft preparation, Dataset Creation

1]organization=Universidad de Cordoba, city=Cordoba, country=Spain

2]organization=Sapienza Università di Roma, Dipartimento di Informatica, city=Rome, country=Italy

3]organization=H.A.O. DEMETER, NAGREF, Institute of Olive Tree, Subtropical Crops and Viticulture, city=Chania, Crete, country=Greece

4]organization=Council for Agricultural Research, city=Rende, country=Italy

5]organization=Olive Research Institute, city=Izmir, country=Turkey

OliVaR: Improving Olive Variety Recognition using Deep Neural Networks

Hristofor Miho [email protected]

Giulio Pagnotta [email protected]

Dorjan Hitaj [email protected]

Fabio De Gaspari [email protected]

Luigi Vincenzo Mancini [email protected]

Georgios Koubouris [email protected]

Gianluca Godino [email protected]

Mehmet Hakan [email protected]

Concepcion Muñoz Diez [email protected] [

[

Abstract

The easy and accurate identification of varieties is fundamental in agriculture, especially in the olive sector, where more than 1200 olive varieties are currently known worldwide. Varietal misidentification leads to many potential problems for all the actors in the sector: farmers and nursery workers may establish the wrong variety, leading to its maladaptation in the field; olive oil and table olive producers may label and sell a non-authentic product; consumers may be misled; and breeders may commit errors during targeted crossings between different varieties. To date, the standard for varietal identification and certification consists of two methods: morphological classification and genetic analysis. The morphological classification consists of the visual pairwise comparison of different organs of the olive tree, where the most important organ is considered to be the endocarp. In contrast, different methods for genetic classification exist (RAPDs, SSR, and SNP). Both classification methods present advantages and disadvantages. Visual morphological classification requires highly specialized personnel and is prone to human error. Genetic identification methods are more accurate but incur a high cost and are difficult to implement.

This paper introduces OliVaR, a novel approach to olive varietal identification. OliVaR uses a teacher-student deep learning architecture to learn the defining characteristics of the endocarp of each specific olive variety and perform classification. We construct what is, to the best of our knowledge, the largest olive variety dataset to date, comprising image data for 131 varieties from the Mediterranean basin. We thoroughly test OliVaR on this dataset and show that it correctly predicts olive varieties with over 86% accuracy.

keywords:

machine learning \sepdeep neural networks \sepolive variety recognition \sepolive variety identification

1 Introduction

The olive tree (Olea europaea L.) represents a priceless genetic variability heritage with more than 1200 varieties worldwide selected over more than 5500 years of cultivation. Due to its unique characteristics, this crop is an inherent part of the Mediterranean culture and mythology Rallo et al. (2018); Rugini et al. (2016). On the other hand, the olive’s high genetic variability contributed to a wide range of derived products Miho et al. (2021); Rallo et al. (2018). Nowadays, olive genetic resources are conserved by a network of 23 national and international Germplasm Banks (GBs) coordinated by the International Olive Council - (“International Olive Council - Germplasm Banks Network,” 2020). The olive oil and table olive trade has experienced a recent market boom Global Trade (2021). Consumers are increasingly interested in healthy food to improve their quality of life and prevent chronic diseases Delgado-Lista et al. (2022); Casini et al. (2014); Luisa Badenes and Byrne (2012). The identification of olive varieties is a complex and crucial process that affects all stakeholders and end-users. Hence, considerable scientific efforts are invested in the development of new methods able to perform an efficient and reliable identification Atienza et al. (2013); Barranco et al. (2000); Belaj et al. (2022); Trujillo et al. (2014); Rugini et al. (2016). The accurate identification of varieties guarantees the correct management of germplasm banks, the distribution and marketing of true-to-type varieties by nurseries, fair trading and the consumer confidence Bartolini et al. (2005); Haouane et al. (2011); Koubouris et al. (2019). The table and olive oil label “Protected Designation of Origin” (PDO) is among the most demanded by consumers, as it is associated with organoleptic quality and nutritional properties Parra-López et al. (2015). Therefore, the proper identification of olive varieties is essential for numerous reasons but it is a complex and time-consuming task, requiring specialized personnel and expensive equipment Likudis (2016); Satorres Martínez et al. (2018). The most widespread and community-accepted methods for olive varietal identification are based on the application of morphological and genetic markers Trujillo et al. (2014).

Morphological markers in olives were firstly selected and applied for varietal classification in 1984 by Barranco et al. Barranco and Rallo (1984). In the 2000s, a simplified morphological scheme proposed by Barranco et al. Barranco et al. (2000) was adopted as the reference by the International Union for the Protection of New Varieties of Plants (UPOV), which is still in force today UPOV (2011). This pomological scheme allowed the cataloging of 272 Spanish olive varieties Barranco et al. (2005). It includes 24 characters describing the tree (3), leaf (4), fruit (7), and endocarp (10). Out of these 24 characters, those of the endocarp (olive pit) were considered the most meaningful for the varietal identification. Indeed, the endocarp could be considered the natural fingerprint of the olive tree Hannachi et al. (2017). The endocarp is the principal organ for varietal identification because: a) environmental factors scarcely influence its morphology; b) it presents significant polymorphism among varieties; c) it is to preserve, and transport Barranco et al. (2005); and d) and its analysis presents a low implementation cost Laaribi et al. (2017). However, despite all these benefits, performing an accurate and reliable morphological characterization of olive varieties requires thorough training, being highly prone to human error Satorres Martínez et al. (2018); Sun (2016).

On the other hand, molecular techniques for olive cultivar identification were developed in the 1990s Belaj et al. (2003); Trujillo et al. (1995). The first genetic markers applied for varietal identification were Random Amplified Polymorphic DNA (RAPDs) Belaj et al. (2003). Later on, these markers were replaced by Microsatellites or Simple Sequence Repeat markers (SSRs), which demonstrated a robust discrimination capacity thanks to their large polymorphism. SSRs in combination with morphological markers, have been widely implemented, giving robust and useful results in the identification of olive cultivar collections Emmanouilidou et al. (2018); Trujillo et al. (2014). Genetic markers provide higher discriminatory capacity than the morphological markers. However, some limitations have been observed related to SSR markers, such as the complexity of establishing clear thresholds for intra- and inter-varietal variability Bakkali et al. (2019); Baldoni et al. (2009); Trujillo et al. (2014). Also, in a few cases, phenotypically different accessions presented the same or very similar SSR profile, leading to their classification as different varieties Barranco et al. (2005). In addition, the International Union for the Protection of New Varieties of Plants (UPOV) primarily credits the morphological rather than genetic characterization. Therefore, morphological characterization is mandatory for the technical examinations of distinctness, uniformity, and stability required to register a new variety UPOV (2011). Recently, Single-Nucleotide Polymorphism (SNP) markers joined the list of genetic markers for the varietal identification of olive trees. These markers are described as more powerful than the above-mentioned genetic markers, reducing the error rate of genotyping Belaj et al. (2018, 2022). However, the bottleneck of using genetic tools on a large scale is still their high cost, time consumption, and the need for qualified human resources and sophisticated equipment.

This paper introduces OliVaR, a deep learning olive variety recognizer based on endocarp photos. OliVaR uses knowledge-driven learning paradigm to learn the defining characteristics of the endocarp of each specific olive variety and perform classification. We construct a large-scale dataset of olive endocarp photos, comprising over 72,000 pictures from 131 different olive varieties, and show that OliVaR can reliably classify them with over 86% accuracy.

To summarize, the contributions of this paper are the following:

•

We introduce OliVaR, a deep learning model based on the morphological characteristics of the olive endocarp for varietal classification, thereby automatizing the traditional process of morphological classification.

•

We construct a large-scale dataset of over 72,000 olive endocarp photos spanning 131 varieties from 4 of the largest olive germplasm banks of the Mediterranean area. To our knowledge, this is the largest dataset of olive endocarp photos to date.

•

We thoroughly evaluate OliVaR on this dataset and show that it is able to recognize olive varieties with high accuracy.

•

We perform an analysis of what features of the endocarp OliVaR focuses on, as well as a comparison between our proposed architecture and a state-of-the-art image recognition neural network.

This paper is organized as follows: Section 2 provides relevant background knowledge necessary to understand the contributions. Section 3 introduces OliVaR, our DL-based olive variety recognizer. Furthermore, in Section 4 we provide details about the experimental setup. Section 5 provides the OliVaR evaluation results and discussion on the findings. In Section 6 we discuss related work in the domain and Section 7 concludes the paper.

2 Background

2.1 Deep Learning

Supervised machine learning algorithms make use of labeled data to produce a classifier that is able to predict the label of new, previously unseen instances. Given a set of independent variables $\mathbf{x}$ , the machine learning model should predict a target outcome variable $y$ . To do so, a function that maps these inputs $\mathbf{x}$ to the desired output $y$ needs to be learned. This learning can be expressed using the following optimization problem:

[TABLE]

where $\hat{y}=f(\mathbf{x};\widehat{\theta})$ represents the learning machine. The learned function $f$ provides an estimate of the label $y$ for an input $\mathbf{x}$ . The learning is guided by the loss function $l(\hat{y},y)$ that measures the error for misclassifying $y$ ’s, providing useful information on how the parameters should be tuned in order for the learned machine to perform better on the task at hand. Typically machine learning algorithms are susceptible to overfitting. Overfitting occurs when the algorithm learns the training data “too” well (i.e., memorizing them), but this performance does not generalize well on unseen data. To cope with this issue the learning framework depicted on Equation 1 can be modified by adding an extra term $\Omega(\theta)$ which is independent of the training data 2.

[TABLE]

Supervised learning algorithms such as Support Vector Machines (SVMs) Scholkopf and Smola (2001), Random Forests Breiman (2001), and deep neural networks (DNNs) Goodfellow et al. (2016) can be expressed using 2. Deep Learning (DL) relies heavily on the use of Neural Networks (NN), which are machine learning (ML) algorithms inspired by the human brain and are designed to resemble the interactions amongst neurons Mitchell (1997). While standard ML algorithms require the presence of handcrafted features to operate, NNs determine relevant features on their own, learning them directly from the input data during the training process Goodfellow et al. (2016). Two main requirements underline the success of NNs in general: 1) large quantities of training data, and 2) powerful computational resources. Large amounts of diverse training data enable NNs to learn features suitable for the task at hand, while simultaneously preventing them from memorizing (i.e., overfitting) the training data. Such features are better learned when NNs have multiple layers, thus the deep neural networks. Research has shown that the single-layer, shallow counterparts are not good for learning meaningful features and are often outperformed by other ML algorithms Goodfellow et al. (2016). DNN training translates to vast numbers of computations requiring powerful resources, with graphical processing units (GPUs) a prime example. DL is the key factor for an increased interest in research and development in the area of Artificial Intelligence (AI), resulting in a surge of ML based applications that are reshaping entire fields and seedling new ones. Variations of DNNs, the algorithms residing at the core of DL, have successfully been implemented in multiple domains, including here, but not limited to, image classification Simonyan and Zisserman (2014); He et al. (2016); Chollet (2017), natural language processing Kim (2014); Chen and Manning (2014); Bansal et al. (2016), speech recognition Graves et al. (2013); Hinton et al. (2012), data (image, text, audio) generation Menick and Kalchbrenner (2019); Karras et al. (2019); Behrmann et al. (2019); Pagnotta et al. (2022), cyber-security De Gaspari et al. (2019); Hitaj et al. (2022, 2023), and even aiding with the COVID-19 pandemic Lozano et al. (2021).

3 OliVaR

In this section, we present OliVaR, our novel neural network-based olive variety recognition approach. OliVaR is constructed around a knowledge-driven learning (KDL) paradigm. KDL paradigm consists of boosting the learning capabilities of a model over a dataset by constructing an ensemble of models (which act as experts). These experts are used via transfer learning to guide the learning process of another deep neural network model. The KDL paradigm is shown to be able to assist in the training of an ML model by allowing it to a) converge faster and b) achieve good performance under conditions of limited training data or task complexity. Both characteristics are fairly welcome given the task at hand. Especially the second, as the collection of the olive fruit, the specific genetic checks to guarantee the varietal authenticity and correct labeling, and the processing of the olive stones/endocarps to be photographed is a laborious task that requires significant efforts and costs. Therefore, in this study, we had to limit the number of photographed endocarps to 150-200 since we are evaluating about 131 different olive varieties distributed in the international Germplasm Banks (GBs) of 4 different countries (Spain, Italy, Greece, and Turkey).

We base the foundations of OliVaR on recent work on KDL approach from Avola et al. (2019, 2022). Similar to Avola et al. Avola et al. (2019, 2022), our olive variety recognizer is based on three main components. Those components are the data augmentation component, the ensemble of experts and the knowledge-driven component. The latter two are considered as OliVaR model architecture in what follows.

3.1 Data augmentation component

Given that we are attempting a KLD learning paradigm due to the scarcity of data and the complexity of the task, we first have to implement data augmentation techniques to extract more information from the data. To do so, we need to enhance the features of the olive endocarps, such as the texture of the surface of the olive, to help the model learn the small differences that occur between different olive varieties. In what follows, we present the preprocessing steps undertaken:

First, we convert the image into a greyscale image to eliminate possible bias given by the colors that can be conditioned by the different lighting conditions that may have been present during the data collection. Once the grayscale image is obtained, we proceed with the data augmentation techniques we selected to enhance the peculiarities of the olive endocarp. (as shown in Figure 1(a)) 2. 2.

Secondly, we use the Local Binary Pattern (LBP) Ojala et al. (1994) augmentation technique on the grayscale version of the image. The LBP method is usually used to study the local properties of the image and identify the characteristics of individual parts of the image, such as textural information, using a combination of statistical and structural methods (as shown in Figure 1(b)). 3. 3.

Thirdly, we used Discrete Wavelet Transform (DWT) Graps (1995) on the grayscale version of the image. DWT has been successfully used in state-of-the-art applications for texture recognition, and it is instrumental in this particular task to highlight the texture of the olive endocarp surface (as shown in Figure 1(c)).

After obtaining the augmented images, we stack them together to obtain a three-channel image where, instead of the RGB channels, we have the grayscale image, the LBP-generated image, and the DWT-generated image corresponding to each individual channel. We can do so because all images are grayscale, and thus this stacking procedure results in an image representing an olive endocarp with shape $(w\times h\times 3)$ , where w and h correspond to width and height of the image, where each channel is a different representation of the olive endocarp (i.e., grayscale, DWT, LBP). Once the images are stacked together, we obtain an image like the one depicted in Figure 2. We process the whole dataset following this procedure. We decided to choose the LBP and DWT representations (alongside the grayscale version of the olive endocarp) because they are shown to provide more information about the texture in small objects, thus resulting in better performance in such tasks Avola et al. (2019).

3.2 OliVaR Architecture

As previously mentioned, the OliVaR architecture is composed of two main components, the ensemble of experts and the knowledge-driven component, which have been proven effective in other compelling computer vision tasks Avola et al. (2022). Figure 3 represents the high-level overview of OliVaR architecture. The ensemble of experts is composed of three pre-trained neural network architectures, defined experts in prior works, that are fine-tuned to solve the olive variety recognition task. Each expert is modified, removing its original last layer and substituting it with a new dense layer. During the training, all the weights of the expert, but the new final dense layer, are frozen, i.e., only the last dense layer is updated. During the training, the experts are fine-tuned to recognize the 131 different olive varieties, and their predictions are concatenated and re-elaborated via a neural network composed of three dense layers. We highlight that while the experts’ weights are frozen, except for the last layer, the weights of this neural network are constantly updated during the training to learn the best interpretation of the experts’ predictions. This interpretation is combined with the prediction of the other component of our general architecture, which is a DenseNet. This DenseNet architecture is trained from scratch to recognize the different varieties based on the augmented olive endocarps images and the experts’ prediction. Indeed, to compose the KD component of our architecture, the prediction of the DenseNet is concatenated with the experts’ prediction and then passed through another neural network consisting of three dense layers. In section 5 we show that this custom architecture allows OliVaR to attain a plausible performance in the olive variety recognition task, being able to classify over 131 different olive varieties.

4 Experimental Setup

4.1 Dataset

The dataset used to train OliVaR consists of 72,690 images spanning 131 classes corresponding to 131 different olive varieties. The dataset was created in collaboration with four international olive germplasm banks that collected the fruits, ensuring the accuracy of the labeling of each fruit using SSR genetic markers. For each variety, about 150-200 olive fruits were collected at maturity index over two once pit hardening and fruit formation has been consolidated Rallo et al. (2018). The endocarp was removed from the fruit flesh and carefully cleaned to ensure that the endocarp patterns were as clean and preserved as possible.

The endocarp cleaning procedure consisted of two main steps:

•

Endocarp extraction: The endocarp of the olives was extracted using a manual pitting machine. In order to smooth the endocarp extraction and cleaning process, the fruits underwent a freeze-thawing process to soften the flesh of the fruit.

•

Cleaning, bleaching and drying: Once the endocarps were extracted from the fruits, the remaining fruit flesh was removed. To do this, we used a plastic mesh, and by scrubbing and rinsing with water, the flesh remains were cleaned off. This task was important because no flesh residue can remain in the endocarp, as it hinders its morphological characteristics, as well as to avoid the growth of fungus. The next step is bleaching. For this purpose, a 50% solution of bleaching agent (e.g., sodium hypochlorite) is used. The endocarps are kept in sodium hypochlorite for 30 to 60 minutes until a clear whitish color is visible. Subsequently, the endocarps were dried at room temperature for one week or at 37ºC in the oven for 48 hours. Once dried, the endocarps were stored in labeled plastic containers indicating the name of the olive variety they belong.

After the cleaning and labeling procedures, two pictures were taken per endocarp. The first picture position has always been taken randomly, and the second position has been taken by rotating the first position by 180 degrees around its vertical axis (Figure 4) to consider possible endocarp asymmetry. In this way, we build a dataset that will guide the ML model to generalize and learn to recognize the olive variety given an image corresponding to any side of the olive endocarp.

4.2 Software and Hardware Requirements

OliVaR is built on top of version 1.7.1 of the PyTorch ML framework Paszke et al. (2019), using an environment with Python version 3.8.5. The experiments were conducted on a desktop PC running the Ubuntu 20.04.2 LTS operating system with a Ryzen 9 3900x processor, 64GB of RAM, and Nvidia GeForce RTX 2080Ti GPU with 11GB of memory.

5 Evaluation

This section thoroughly evaluates the performance of OliVaR on the test set and provides insights on the functionality of the approach.

5.1 Model Performance

We trained OliVaR using the training set described in section 4.1. Our results show that the performance of the custom architecture is superior to the performance of state-of-the-art architectures for image classification.

Figure 5 represents the confusion matrix obtained by evaluation OliVaR on the testing. The accuracy of OliVaR was 86% over the 131 olive varieties present in the test set. The y-axis of a confusion matrix represents the ground truth label for each test set sample, and the x-axis represents the label predicted by the model (OliVaR in this case). In a perfect scenario, only the square diagonal would be highlighted, meaning that the model could correctly predict each sample in the test set. If we look at the diagonal in the confusion matrix, we see that only a few olive varieties are confused with each other, while most are correctly classified. These results demonstrate that a deep learning-based architecture is a good fit to distinguish olive varieties based on a photo of the endocarp, a method significantly less expensive than a DNA test.

5.2 What has OliVaR learned?

In this section, we delve deeper into the inner working of OliVaR and try to evaluate how the model has learned and what regions of the image it considers as most significant in making a decision. To do so, we employ Gradient-weighted Class Activation Mapping (Grad CAM) Selvaraju et al. (2017). Grad-CAM uses the gradients of any target concept in a classification network flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image that the network predominantly uses for predicting. We employ the Grad CAM technique in OliVaR and in Figures 6 and 7, we present the output on samples from the test dataset, specifically for the olive varieties of Bosana and Mignolo Cerretano.

We notice in Figure 6 how OliVaR focuses specifically on the olive endocarp region of the image, where we see the highlights of the olive endocarp pattern that OliVaR uses to recognize that this particular olive endocarp belongs to the Bosana olive variety. Interestingly, in this case the Grad-Cam is highlighting patterns all over the olive endocarp, which correspond to the rough pattern that Bosana variety contains throughout the whole surface of the endocarp. On the contrary, in Figure 7, we notice how the Grad Cam output for a different olive variety, namely Mignolo Cerretano, is different. We notice that Grad Cam has highlighted the upper and lower extremities of the olive endocarp. This means that OliVaR is focusing more on those regions for this particular variety for classification. This is interesting to note due to the fact that the middle section surface of this particular olive variety is generally smooth and as such OliVaR, during training has learned to focus more on the extremities. It is also interesting to note that morphological classification done by human experts also mostly focuses on the specific characters of the endocarp extremities Barranco et al. (2000, 2005); UPOV (2011).

5.3 Discussion of findings

Nowadays, when someone needs to solve an image classification task they commonly take a pretrained DNN architecture on the Imagenet Deng et al. (2009) task, typically being the ResNet He et al. (2016) or VGG16 and VGG19 Simonyan and Zisserman (2014) architectures and fine tune them on the dataset corresponding to the task at hand. Typically this approach works reasonably fine for most of the tasks. The task at hand that we treat in this work is a bit more complex to solve following this simplistic approach due to the limited amount of training data spread over a large number of classes.

In this section we present a comparison between one state-of-the-art architecture, namely ResNet and our custom KDL architecture OliVaR. In Figure 8 and Figure 9 we show how these two different architectures perform. In Figure 8 we note that the training loss of both ResNet and OliVaR follows a similar pattern, needing the same amount of training epochs to converge. This is a plausible result due to the fact that OliVaR is a much more complex architecture compared to ResNet, and requiring the same amount of time is satisfactory, especially when we look at the models performance. As shown in Figure 9, we see that the validation accuracy of OliVaR is over three percentage points higher than ResNet highlighting the improvements that a KDL tailored architecture such as OliVaR brings to this complex task.

6 Related Work

In the past decade, the scientific community has paid great attention and effort to computer technologies and statistical methods as an alternative or as a complement for carrying out varietal identification of the olive tree in a simpler, quicker, and more cost-effective way such as the works by Avramidou et al. (2020); Koubouris et al. (2019); Satorres Martínez et al. (2018); Sesil et al. (2019); Vanloot et al. (2014). The common goals of these investigations have been the automation and simplification of the traditional method of varietal morphological characterization based on the manual observation of the endocarp and other organs such as leaves.

To the best of our knowledge, prior to OliVaR, Sesil et al. (2019) are the only ones that have attempted a DL-based approach for the olive varietal classification. Contrary to OliVaR, the approach of Sesil et al. relies on olive leaves to perform the olive variety classification. The approaches based on olive leaves are shown to be unreliable, and most prior work has considered the endocarp as the bearer of significant amounts of information for the varietal classification Barranco et al. (2005); Koubouris et al. (2018, 2019). Furthermore, the approach by Sesil et al. (2019) is trained and evaluated on only four olive varieties compared to the 131 varieties on which OliVaR is trained and evaluated, thus rendering OliVaR a more complete tool for olive varietal classification.

Given that the olive endocarp is considered a more reliable source of information for varietal classification, a significant amount of work relying on computer and statistical techniques to automate or semi-automate the process of varietal identification through the endocarp morphological traits has been carried out. Specifically, Koubouris et al. (2019), using the statistical method of Classification Binary Tree, correctly classified 42 olive varieties based on 11 endocarp traits previously extracted in a semi-automated way. In addition, statistical analysis of two-dimensional Koubouris et al. (2018) or three-dimensional images Manolikaki et al. (2022) have been successfully used to characterize 50 olive varieties. Similar and satisfactory results have also been obtained by other authors using statistical techniques such as Principal Component Analysis and Partial Least Square-discriminant analysis Blazakis et al. (2017); Satorres Martínez et al. (2018); Vanloot et al. (2014). However, these methods have not found widespread use by the olive growing community and the authorities due to the complexity of transferring the methods across entities Koubouris et al. (2019) and the high cost of intermediate steps in semi-automatic feature extraction techniques. Nevertheless, all these studies stress the fact that the morphological characteristics of the olive endocarp are undoubtedly a reliable fingerprint for varietal identification.

ML has been successfully introduced in oliviculture and has shown promising results. Specifically, Khosravi et al. (2021) built image-based models to automatically estimate the fruit ripening stages, while Cruz et al. (2007) have been able to predict or detect with high accuracy Xylella Fastidiosa. In a similar approach, Diaz et al. (2004) developed models to classify the table olives into different quality categories, depending on the skin defects, with an accuracy of more than 90%.

Furthermore, DL has also been implemented on a massive scale in other plant species for a multitude of reasons, such as crop management, including applications in yield prediction, disease detection, weed detection, crop quality, and species recognition Ali et al. (2017); Hussain et al. (2022); Liakos et al. (2018); Ramos et al. (2017); Sengupta and Lee (2014). Regarding varietal identification/classification in other species using DL models based on morphological characters, different authors have reported classification results showing a very high accuracy (between 90 and 98%) as in the case of three plum varieties classification Ropelewska et al. (2022); the classification of 16 grapevine varieties Fuentes et al. (2018), classification of Durian varieties Lim and Chuah (2019); and classification of three legume varieties via leaf vein pattern analysis Grinblat et al. (2016).

7 Conclusion

In this work, we presented OliVaR, a neural network-based approach that is able to recognize olive varieties based on photos of the endocarp with an accuracy of 86% over the 131 olive varieties considered. OliVaR outperforms human beings in this task, while having a close performance to the DNA-based olive variety recognition. Unlike DNA-based olive recognition, which typically requires days, expensive equipment, and specialized personnel to obtain the result, OliVaR can provide an answer in just a few milliseconds.

We believe that OliVaR will assist everyone involved in the olive sector, because the quick and accurate authentication of olive varieties is critical to avoid mistakes in establishing olive plantations or crossbreeding within a breeding program to obtain new varieties with better characteristics. An error in the varietal determination may lead to significant economic losses for a farmer or breeder. Furthermore, the rapid detection of olive varieties can help the oil extraction industry to quickly authenticate and differentiate mono-varietal oils and avoid possible fraud for the end consumers. However, this is the first preliminary work of such dimensions related to olive varietal identification. The results still need to be validated in time and space to prove the reproducibility of the model for further commercial use.

8 Acknowledgments

This work was supported by GEN4OLIVE, a project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101000427. The research has also been supported by the postdoctoral grant "Margarita Salas" (UCOR01MS, BOUCO n.º 2021/00729), awarded by the University of Córdoba (grants to Public Universities for the requalification of the Spanish university system from the Ministry of Universities, Spain) and funded by the European Union – NextGenerationEU.

Moreover, we are grateful for the generous contribution of the following entities and persons who, in the framework of GEN4OLIVE project, provided the olive endocarp photos for the development of OliVaR:

•

University of Cordoba, Cordoba, Spain: Isabel Trujillo Navas, Diego Barranco Navero and Anna-Maria Volakaki.

•

Institute of Olive Tree, Subtropical Crops and Viticulture, Crete, Greece: Ioanna Kaltsa.

•

Olive Research Institute, Izmir, Turkey: Melek Gurbuz, Hulya Kaya.

•

Council for Agricultural Research, Rende, Italy: Enzo Perri, Rosa Nicoletti and Annamaria Lenco.

•

Institut National de la Recherche Agronomique, Marrakech, Morocco: Sara Oulbi.

Bibliography75

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Rallo et al. (2018) L. Rallo, D. Barranco, C. M. Díez, P. Rallo, M. P. Suárez, C. Trapero, F. P. Alfaro, Strategies for Olive (Olea europaea L.) Breeding: Cultivated Genetic Resources and Crossbreeding, in: J. M. Al-Khayri, S. M. Jain, D. V. Johnson (Eds.), Advances in Plant Breeding Strategies: Fruits, volume 3, Springer International Publishing, 2018, pp. 535–600.
2Rugini et al. (2016) E. Rugini, L. Baldoni, R. Muleo, L. Sebastiani, The Olive Tree Genome, Springer International Publishing, 2016.
3Miho et al. (2021) H. Miho, J. Moral, D. Barranco, C. A. Ledesma-Escobar, F. Priego-Capote, C. M. Díez, Influence of genetic and interannual factors on the phenolic profiles of virgin olive oils, Food Chemistry 342 (2021) 128357.
4Rallo et al. (2018) L. Rallo, C. M. Díez, A. Morales-Sillero, H. Miho, F. Priego-Capote, P. Rallo, Quality of olives: A focus on agricultural preharvest factors, Scientia Horticulturae 233 (2018) 491–509.
5Global Trade (2021) U. Global Trade, European Virgin Olive Oil Exports Expand with Booming Supplies from Greece and Italy, 2021.
6Delgado-Lista et al. (2022) J. Delgado-Lista, J. F. Alcala-Diaz, J. D. Torres-Peña, G. M. Quintana-Navarro, F. Fuentes, A. Garcia-Rios, A. M. Ortiz-Morales, A. I. Gonzalez-Requero, A. I. Perez-Caballero, E. M. Yubero-Serrano, O. A. Rangel-Zuñiga, A. Camargo, F. Rodriguez-Cantalejo, F. Lopez-Segura, L. Badimon, J. M. Ordovas, F. Perez-Jimenez, P. Perez-Martinez, J. Lopez-Miranda, Long-term secondary prevention of cardiovascular disease with a mediterranean diet and a low-fat diet (cordiopr
7Casini et al. (2014) L. Casini, C. Contini, N. Marinelli, C. Romano, G. Scozzafava, Nutraceutical olive oil: does it make the difference?, Nutrition and Food Science 44 (2014) 586–600.
8Luisa Badenes and Byrne (2012) M. Luisa Badenes, D. H. Byrne, Fruit Breeding, Developing Fruit Cultivars with Enhanced Health Properties, Springer Science+ Business Media, LLC 2012, 2012.