AFruitDB: A comprehensive dataset of six commonly used Asian fruits for advanced grading and biodiversity insights
Mayen Uddin Mojumdar, Shahrin Islam, Md Al Mamun, Rifat Hasan, Shah Md Tanvir Siddiquee, Narayan Ranjan Chakraborty

TL;DR
AFruitDB is a dataset of six Asian fruits with 3,167 images collected to improve fruit grading and support biodiversity research using machine learning.
Contribution
A novel dataset of six Asian fruits collected for advanced grading and biodiversity insights using mobile cameras.
Findings
AFruitDB contains 3,167 images of six fruit types collected from local markets in Bangladesh.
The dataset enables quality grading of fruits into good, medium, and bad categories.
It supports biodiversity conservation and machine learning applications for grading and yield prediction.
Abstract
The Asian subcontinent produces a vast range of fruits throughout the seasons. However, correctly classifying these fruits according to their qualities can be difficult, frequently necessitating the knowledge of fruit experts and cutting-edge equipment to produce accurate results. Therefore, to enable sophisticated grading methods that efficiently sort and evaluate fruit quality based on various characteristics (such as form, color, size, texture, and other crucial parameters), A unique dataset is deployed to support advanced grading systems. This dataset helps researchers explore genetic variation, ecological adaptation, and environmental factors that affect fruit qualities for conservation and sustainable agriculture. Using a mobile camera, these data are personally collected at various times of the day at local markets in Bangladesh that receive optimal sunlight. To create a unique…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSensory Analysis and Statistical Methods · Metabolomics and Mass Spectrometry Studies · Genetic and phenotypic traits in livestock
Specifications TableSubjectComputer ScienceSpecific subject areaFruits grading, Machine learning, Computer vision, Image processingType of data.jpgData collection3167 images with dimensions of 224 × 224 pixels from various local marketplaces throughout Bangladesh have been gathered. The tomato, papaya, mango, Burmese grape, apple, and banana are the six fruits that we have selected for quality grading. Each of these fruits is divided into three quality grades: good, medium, and bad. These allow us to generate a total of 18 subcategories. These data are collected during the months of July, August, and September 2023 by taking images in a variety of weather situations, including sunny, cloudy, and rainy conditions, as well as at different times of the day, specifically in the morning, noon, and afternoon. Different phone cameras, including the Mi 9T, the iPhone SE, and the Vivo, are used to ensure that all of the images have the same resolution. This dataset is collected from several local marketplaces in Bangladesh, such as Karwan Bazar, Park Bajar, Santosh Bazar, and Jatrabari Bazar. There needed to pre-process the data for fruit grading to enhance visual quality. As some images were captured by iPhone SE, all the pictures are converted to jpg format in the dataset. Fruits are separated from their backgrounds using backdrop removal. The skin tone, ripeness indicators, and surface defects were highlighted through color improvement using Contrast Limited Adaptive Histogram Equalisation (CLAHE). This is crucial to grading since it exposes the subtle differences between good,'' medium,'' and ``bad.'' After that, all the images are scaled to 224 by 224 pixels, which is consistent and lets us confidently compare and examine fruits of different classes. TensorFlow's ImageDataGenerator handles these pre-processing tasks and generates a refined dataset for accurate fruit grading.Data source locationData were collected using mobile phone cameras from the following local markets in Bangladesh:
- 1.Karwan Bazar, Tejgaon, Dhaka-1225 (Latitude: 23.75218546019505, Longitude: 90.39409484060864),
- 2.Park Bazar, Tangail Sadar, Tangail-1900 (Latitude: 24.251049783304733, Longitude: 89.91194110302392),
- 3.Santosh Bazar, University Road, Tangail-1902 (Latitude: 24.2337964036924, Longitude: 89.89024064381567), and
- 4.Jatrabari Bazar, Dhaka-1232 (Latitude:23.709256640583664, Longitude: 90.43392330454013) Data accessibilityRepository name: Mendeley dataData identification number: https://data.mendeley.com/datasets/bz65dz2pbj/1Direct URL to data: 10.17632/bz65dz2pbj.1
Value of the Data
1
- •This dataset includes six different types of fruit image data that can be used for quality grading. These fruit images are collected from various Bangladeshi local markets and categorized as (a) good, (b) medium, and (c) bad.
- •This dataset provides real-world examples of categorizing different fruit species by physical features and quality, such as ripeness and freshness. It is useful for creating machine learning models for agricultural, retail, and quality control applications.
- •Using this dataset, it is possible to create an automated system for fruit grading using artificial intelligence. As a result, labour costs can be reduced, and there is no need to contact fruit experts at farms or marketplaces. This dataset will be very useful in establishing a sustainable automated agriculture system.
- •Researchers in relevant fields can utilize this dataset to automate fruit grading, quality control, and biodiversity studies through machine learning or other artificial methodologies. All of this information is also useful for farmers, food dealers, food scientists, botanists, biodiversity scientists, and others interested in fruit grading systems.
- •This dataset records fruit size, colour, and texture across varieties and grades to reveal biodiversity. Researching how climate, soil, and production practices affect these fruits can help conserve and sustain agriculture by identifying resilient or high-quality varieties.
- •Expanding this dataset by including more photos of the existing fruit kinds would increase its robustness and aid in capturing a wider variety of variances, such as differences in size, shape, color, and texture caused by environmental conditions or ripeness stages. Furthermore, including photos of other fruits and vegetables would greatly boost the dataset's richness and usefulness. This update would make the dataset more suited for a broader range of applications, including multi-class classification and object detection.
Background
2
Fruits are well-known for their nutritious content and delicious flavour. Asian fruits come in a wide variety, including tropical types that are prized for their flavour and healthful properties. The creation of an accurate and quick automatic system for fruit classification is hampered by the lack of a fruit dataset with consistent images, which is the main constraint in horticultural crops. Fruit storage, processing, and export sectors require accurate and timely classification of fruit based on quality and ripeness. The article provides a comprehensive dataset on banana and guava fruits. The data was categorized into three categories: Class A, Class B, and Defect, based on physiological changes [1], images are categorized based on the fruit's maturity level (mature, half-mature, and mature) [2]. Developed an image dataset of Indian fruits, incorporating quality parameters for those that are widely consumed or exported. Therefore, they selected six fruits: apple, banana, guava, lime, orange, and pomegranate to compile a dataset [3]. A neat and clean dataset is the elementary requirement to build accurate and robust machine learning models for the real-time environment. There is a loss of 30–35 % of the harvested fruit since there aren't enough trained workers. The subjective nature of human vision makes accurate fruit identification, categorization, and grading a challenge. Therefore, the fruit industry must implement an automated system [4,5].
While much of the previous research has focused on certain fruits like bananas or guavas, or popular types in India like apple, guava, lime, orange, or pomegranate, this dataset extends the scope to include six Asian fruits: tomato, papaya, apple, banana, burmese grape, mango, and papaya. By highlighting under-researched fruits like the Burmese grape, this unique assortment highlights regional diversity and addresses gaps in the existing knowledge. Because manual fruit sorting and grading are inefficient and time-consuming. Automatic segregation systems use computer vision and deep learning to reduce human labor, cost, and time [[6], [7], [8]]. Moreover, image processing allows machine-controlled fruit sorting for better quality, productivity, and labor efficiency [9,10].
By using this dataset, researchers can classify fruits and do quality grading across three levels (good, medium, and bad). Unlike most datasets that focus on economically attractive fruit look and maturity, this dataset provides insights into biodiversity. It analyses Asian subcontinent fruits to study genetic and phenotypic variation, improving fruit biodiversity understanding. In Table 1, a clear idea of the present work dataset and others can be noticed. And can easily understand how impactful this dataset is.Table 1. Comparison with existing datasets.Table 1SLFruitOur DatasetA. Kumari, J. Singh [1]V. Meshram and K. Patil [3]S. K. Behera, A. K. Rath, and P. K. Sethy [5]1Apple482X2403X2Banana48411932485X3Burmese Grape630XXX4Mango618XXX5Papaya451XX3006Tomato502XXX
Data Description
3
About 3,167nimages with dimensions of 224 × 224 pixels from several local marketplaces throughout Bangladesh have been collected. For quality evaluation, six types of fruits, tomato, papaya, mango, Burmese grape, apple, and banana, were chosen. To create a total of 18 subcategories, each of the six chosen fruits is separated into three quality grades: good, medium, and bad.
These data were collected in July, August, and September 2023 by taking images in a variety of weather situations, including sunny, cloudy, and rainy conditions, as well as at different times of the day, specifically in the morning, noon, and afternoon. Different types of phone cameras, including the Mi 9T, the iPhone SE, and the Vivo, are used to ensure that all the images have the same resolution. The images of fruits are gathered from several local marketplaces in Bangladesh, including Karwan Bazar (Tejgaon, Dhaka-1225), Park Bajar (Tangail Sadar, Tangail-1900), Santosh Bazar (University Road, Tangail-1902), and Jatrabari Bazar (Dhaka-1232).
Table 2 provides details about the photo collection of this dataset, such as the name of the fruit, the weather conditions at the time of photo capture, the date, the name of the photography device, and the specific location of the photo capture. This detailed information could be used to enhance the understanding of the dataset and its significance to the study.Table 2. Data collection details.Table 2. TaskFruit/VegetableWeatherDateTimeCamera DeviceLocationGradingAppleCloudy3-Aug-23AfternooniPhone SEKawran Bazar(Latitude: 23.75218546019505, Longitude: 90.39409484060864)GradingBananaSunny8-Jul-23MorningMi 9TPark Bazar, Tangail(Latitude: 24.251049783304733, Longitude: 89.91194110302392)GradingBurmese GrapeRainy22-Aug-23NoonVivoSantosh Bazar(Latitude: 24.2337964036924, Longitude: 89.89024064381567),GradingMangoSunny13-Sep-23AfternooniPhone SEKawran Bazar(Latitude: 23.75218546019505, Longitude: 90.39409484060864)GradingPapayaCloudy27-Jul-23MorningMi 9TPark Bazar, Tangail(Latitude: 24.251049783304733, Longitude: 89.91194110302392)GradingTomatoRainy10-Aug-23NoonVivoSantosh Bazar(Latitude: 24.2337964036924, Longitude: 89.89024064381567),
Table 3 shows the distribution of images for six commonly used Asian fruits: Apple, Banana, Burmese Grape, Mango, Papaya, and Tomato. Each fruit is categorized into three quality grades: fully fresh (First Grade), another is somewhat fresh (Second Grade), and the last is seriously rotten (Third Grade).Table 3. Dataset Distribution of Quality Grades for Six Asian Fruits.Table 3. Class NameDescriptionVisualizationApple1st gradeThis category includes apples that are completely fresh and do not exhibit any obvious signs of damage or rot. These apples have reached their peak condition and are ready to be consumed or sold. The folder labeled in dataset Apple/1st grade has a total of 198 images.Image, table 32nd gradeThis group includes apples with minor imperfections or slight signs of rotting. Although they possess a restricted shelf life, these apples remain fit for consumption or processing. The folder labeled Apple/2nd grade has a total of 172 images.Image, table 33rd gradeThe apples in this category are highly rotten and should not be ingested since they are unsuitable for human consumption. These apples are mostly discarded as waste. The folder labeled in dataset Apple/3rd grade has a total of 112 images.Image, table 3Banana1st gradeIn this category, the dataset folder name is Banna/1st grade. It contains 243 images, all completely fresh and in good shape, with no rot or damage symptoms. These images depict bananas suitable for immediate eating or sale.Image, table 32nd gradeIn this category, Bananas have minor imperfections or slight signs of rot. But they are still prepared for consumption or further processing. It basically says the medium condition of banana. This folder's name is Banana/ 2nd grade, and it contains 239 images of bananas.Image, table 33rd gradeThe folder is titled ``Banana/3rd grade'' for this category. This grade includes bananas that have become highly rotten and are inappropriate for consumption. These bananas are normally thrown away as waste.Image, table 3Burmese Grape1st gradeIn this category Burmese Grape is considered 1st grade because the folder named Burmese Grape/1st grade includes 308 images of fresh and fully ripe Burmese grapes. So, it can be say the burmese grapes are in good condition.Image, table 32nd gradeBurmese grapes are here in medium condition, with slight imperfections or early signs of rotting. The dataset's folder name is Burmege Grape/ 2nd grade, including 157 images.Image, table 33rd gradeThis category contains 165 images, all of which are in bad condition. The lowest quality is because the images are heavily rotten and unfit for eating or any type of further processing. The folder name is Burmese Grape / 3rd grade.Image, table 3Mango1st gradeMangoes that are fully fresh and have no damage or rotten signs are taking place in this category in the folder Mango/1st grade, which includes 251 images of mangoes. That is why they are classified in good condition.Image, table 32nd gradeThis category, which includes,250 images of mangos, is organized into a folder named Mango/2nd grade. This is called second grade because these mango fruits are not fully fresh; some parts have rotten signs. However, they are still of medium quality for processing.Image, table 33rd gradeMangoes that are severely spoiled and not suited for human eating make up this grade. Mangoes like these are usually thrown away. The Mango/3rd grade folder contains 117 images of the dataset.Image, table 3Papaya1st gradeThese papayas are in peak condition, showing no indications of ripeness or damage. These are of good quality, suitable for both eating and selling. There are a grand total of 152 images in the Papaya/1st grade folder that showcase this particular category.Image, table 32nd gradeIn this category Papaya are not completely fresh, have minor rotting. The folder Papaya/2nd grade contains 147 images representing this category.Image, table 33rd gradeThe folder Papaya/3rd grade, with 152 images representing this 3rd grade. Papayas here are mostly spoiled with rotten signs.Image, table 3Tomato1st gradeThe dataset indicates that this 1st grade is in good condition, with tomatoes that are fully fresh without any rotten signs or damage and ready for further processing. The folder name is Tomato/1st grade, and it contains 145 images of tomatoes, all in good condition.Image, table 32nd gradeTomatoes that have small flaws or early signs of going bad. These aren't too fresh, but they might still be good for cooking or preparing. The images in this category can be found in the folder Tomato/2nd grade contains 231 images.Image, table 33rd gradeTotally rotten tomatoes that are not food for humans. These are items that are wasted or should be thrown away. There are 126 pictures for this grade in the folder Tomato/3rd grade.Image, table 3
Experimental Design, Materials and Methods
4
The fruit grading data collection process required several types of materials, as mentioned in Table 2. Table 2 also adds data collection places, weather conditions, and the device used in the working process.
Fruits grading, dataset has 6 types of fruits with three grades of fruits. For grading these three classes, our team set a meeting with a food engineer and collected some important information for the grading process, like how we can grade fruits in these 3 grades and which characteristics are more important for these processes. By considering this information, collected some data for every fruit and examined it by him. Then, finally collected fruit grading datasets from different local markets. Table 4 represents the fruit grading dataset. By seeing this table, find a clear idea of the dataset. In this table, the 1st grade represents fully fresh fruits, the 2nd grade represents some rotten fruits, and finally, the 3rd grade is almost rotten.Table 4. Sample of dataset.Table 4. Image, table 4
The fruit grading data collection process followed several steps, as shown in Fig. 1. In this process, some local markets, well-known for fruit selling, were first selected. Then, went to these markets with selected devices. Then, some fruits of each type and grade were collected. Images were captured from the collected fruits. The images were examined to identify which ones needed cleaning or resizing. The images that needed adjustments were cleaned and resized. A food expert checked the processed images. Finally, the verified images were stored.Fig. 1. Data collection process of fruits grading dataset.Fig 1
Limitations
While collecting samples, we faced some difficulties, especially with the Burmese grape, which was less common due to seasonal availability. This dataset focused on only six types of fruit, which may limit its application in broader markets that handle a wider variety of produce. Local market conditions and mobile cameras may cause unpredictability that affects model consistency under illumination or background factors not in the dataset. Moreover, certain classes have a lower number of fruits; by applying augmentation techniques, researchers can increase the number of samples as required for their machine learning model training.
Ethics Statement
This dataset was collected ethically from local markets in Bangladesh without harming ecosystems or agricultural practices. It is intended for educational and research purposes, supporting advancements in machine learning and sustainable agriculture. Users must acknowledge the source and avoid misuse for unethical or harmful activities. The goal is to promote responsible innovation and biodiversity conservation.
CRediT Author Statement
Mayen Uddin Mojumdar: Conceptualization, Data Curation, Supervision; Shahrin Islam: Software,Writing - Original Draft; Md Al Mamun: Project administration, Methodology; Rifat Hasan: Data Curation; Shah Md Tanvir Siddiquee: Visualization; Narayan Ranjan Chakraborty: Writing - Review.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abiban Kumari J.S.Banana and Guava dataset for machine learning and deep learning-based quality classification Data Brief 5720242352340910.1016/j.dib.2024.111025 PMC 1154702839525654 · doi ↗ · pubmed ↗
- 2Pathmanaban P.Gnanavel B.K.Anandan S.S.Comprehensive guava fruit data set: digital and thermal images for analysis and classification Data Brief 50202310948610.1016/j.dib.2023.109486 PMC 1045083137636131 · doi ↗ · pubmed ↗
- 3Meshram V.Patil K.Fruit Net: indian fruits image dataset with quality for machine learning applications Data Brief 40202210768610.1016/j.dib.2021.107686 PMC 866882534917715 · doi ↗ · pubmed ↗
- 4Behera S.Rath A.Mahapatra A.Sethy P.K.Identification, classification & grading of fruits using machine learning & computer intelligence: a review J. Ambient Intell. Humaniz. Comput.2020111
- 5Behera S.K.Rath A.K.Sethy P.K.Maturity status classification of papaya fruits based on machine learning and transfer learning approach Inf. Process. Agric.82021244250
- 6Nerella J.N.V.D.T.Nippulapalli V.K.Nancharla S.Vellanki L.P.Suhasini P.S.Performance comparison of deep learning techniques for classification of fruits as fresh and rotten International Conference on Recent Advances in Electrical, Electronics, Ubiquitous Communication, and Computational Intelligence (RAEEUCCI)2023
- 7Hayat A.Morgado-Dias F.Choudhury T.Singh T.P.Fruit Vision: a deep learning based automatic fruit grading system Open Agric.9202420220276
- 8Ohali Y.A.Computer vision based date fruit grading system: design and implementation J. King Saud Univ.-Comput. Inf. Sci.2320112936
