Classifying Shelf Life Quality of Pineapples by Combining Audio and Visual Features
Yi-Lu Jiang, Wen-Chang Chang, Ching-Lin Wang, Kung-Liang Hsu, Chih-Yi Chiu

TL;DR
This study develops a multimodal classification model combining audio and visual features to assess pineapple shelf life quality, achieving high accuracy and reducing data requirements.
Contribution
It introduces a novel cross-modal classification approach using a contrastive audiovisual masked autoencoder and a new dataset for pineapple quality assessment.
Findings
Cross-modal model achieved 84% accuracy.
Outperformed unimodal models by 6% (audio) and 18% (visual).
Sampling a smaller training set improved efficiency.
Abstract
Determining the shelf life quality of pineapples using non-destructive methods is a crucial step to reduce waste and increase income. In this paper, a multimodal and multiview classification model was constructed to classify pineapples into four quality levels based on audio and visual characteristics. For research purposes, we compiled and released the PQC500 dataset consisting of 500 pineapples with two modalities: one was tapping pineapples to record sounds by multiple microphones and the other was taking pictures by multiple cameras at different locations, providing multimodal and multi-view audiovisual features. We modified the contrastive audiovisual masked autoencoder to train the cross-modal-based classification model by abundant combinations of audio and visual pairs. In addition, we proposed to sample a compact size of training data for efficient computation. The experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Music and Audio Processing · Postharvest Quality and Shelf Life Management
