Application of Image Computing in Non-Destructive Detection of Chinese Cuisine
Xiaowei Huang, Zexiang Li, Zhihua Li, Jiyong Shi, Ning Zhang, Zhou Qin, Liuzi Du, Tingting Shen, Roujia Zhang

TL;DR
This paper introduces a new hyperspectral imaging method to detect the quality and authenticity of Chinese cuisine non-destructively.
Contribution
The study introduces a novel hyperspectral imaging framework with deep learning for non-destructive detection of Chinese cuisine ingredients and quality.
Findings
The model achieved 97.8% average classification accuracy across 15 Chinese dish categories.
It quantified chili oil in Mapo Tofu with 0.43% w/w MAE and assessed dim sum freshness with 95.2% accuracy.
The method improved classification accuracy by over 15 percentage points compared to traditional RGB methods.
Abstract
Food quality and safety are paramount in preserving the culinary authenticity and cultural integrity of Chinese cuisine, characterized by intricate ingredient combinations, diverse cooking techniques (e.g., stir-frying, steaming, and braising), and region-specific flavor profiles. Traditional non-destructive detection methods often struggle with the unique challenges posed by Chinese dishes, including complex textural variations in staple foods (e.g., noodles, dumplings), layered seasoning compositions (e.g., soy sauce, Sichuan peppercorns), and oil-rich cooking media. This study pioneers a hyperspectral imaging framework enhanced with domain-specific deep learning algorithms (spatial–spectral convolutional networks with attention mechanisms) to address these challenges. Our approach effectively deciphers the subtle spectral fingerprints of Chinese-specific ingredients (e.g., fermented…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28- —National Key Research and Development Program of China
- —Special Funds for Jiangsu Province Science and Technology Plans
- —National Natural Science Foundation of China
- —Natural Science Foundation of Jiangsu Province
- —Foundation of Jiangsu Specially-Appointed Professor
- —Earmarked Fund for China Agriculture Research System
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Advanced Chemical Sensor Technologies · Phytochemicals and Antioxidant Activities
1. Introduction
Chinese cuisine is celebrated for its immense diversity, rich flavors, and a wide array of cooking techniques. Even a single dish can exhibit significant variations in preparation methods across regions, resulting in notable differences in both taste and presentation [1,2]. This culinary diversity not only underscores the profound historical and cultural significance of Chinese food but also highlights its strong regional identities [3]. However, this complexity and variety pose considerable challenges to standardization in industrial food production, complicating efforts to establish consistent methods for assessing nutritional values, such as calorie content [4,5]. Such inconsistencies present a critical obstacle to addressing the needs of modern, fast-paced lifestyles, which demand convenience while emphasizing scientifically informed, health-conscious dietary practices.
In response to these challenges, the rapid advancement of artificial intelligence (AI) and rising living standards have positioned food image recognition as a pivotal research area in health and dietary management [6]. AI technology has gained considerable recognition in non-destructive detection due to its potential to enhance food safety, optimize production processes, and improve consumer experiences [7]. However, Chinese cuisine, characterized by its complexity and diversity, presents unique challenges for automated recognition systems [8]. These challenges include intricate ingredient combinations, visual diversity across regions, and difficulties in nutritional estimation [9,10]. Therefore, it becomes essential to explore how advanced image computing technologies can adapt to and effectively support Chinese culinary applications.
This study explores the application of image computing technologies in the non-destructive detection of Chinese cuisine, with a particular focus on their potential for nutritional estimation and health assessment. Accurate food recognition not only facilitates individual health management but also offers innovative solutions across various sectors of the food industry. Monitoring dietary intake is essential for understanding individual eating habits, identifying unhealthy patterns, and ensuring balanced nutrition [11,12]. A well-balanced diet provides adequate energy and nutrients, strengthens the immune system, supports overall health, and contributes to disease prevention [13]. Moreover, a balanced diet is critical for meeting diverse nutritional requirements, promoting optimal health, and avoiding digestive strain caused by excessive consumption [14]. The type and quantity of food consumed directly influence blood glucose levels, and personalized food pairings can play a significant role in diabetes management [8]. Additionally, a structured diet supports cardiovascular health, as specific foods have been shown to reduce the risk of related diseases.
Furthermore, food allergies (Table 1), a serious health concern, can affect multiple organ systems and may result in life-threatening anaphylaxis [15]. In the context of complex dishes like Chinese cuisine, the detection of allergens is particularly critical. AI-powered, non-invasive testing of food materials has the potential to mitigate allergy risks and enhance food safety [16]. Thus, advancing intelligent food recognition is not only a technical goal but also a public health imperative [6,17].
Traditional deep learning models utilizing RGB images have advanced the recognition and quality assessment of Chinese food ingredients but continue to face limitations in accuracy and generalizability [18]. Hyperspectral imaging offers a promising alternative by capturing the chemical composition of ingredients [19,20], thereby enhancing feature extraction and classification when integrated with deep learning techniques [21]. This synergy improves recognition accuracy and holds significant potential for applications in dietary health and nutritional analysis [22,23,24]. Future research should prioritize optimizing hyperspectral image acquisition and processing to improve data reliability, alongside refining deep learning models to reduce computational complexity and enhance overall performance [24]. The combination of hyperspectral imaging and deep learning represents a transformative approach to ingredient recognition and health evaluation [25].
Given the rapid development and inherent limitations of current methods, this paper provides a systematic review of image computing technologies for non-destructive food detection, with a focus on Chinese cuisine. By comparing domestic and international research, it identifies prevailing methodologies, emerging trends, and critical challenges in the field. Rather than cataloging existing technologies, this review analyzes how current approaches engage with the unique visual and structural complexities of Chinese dishes. In doing so, it reveals key knowledge gaps and underexplored potentials, offering future research directions that can support the intelligent digital transformation of Chinese culinary culture.
Ultimately, this work aims to contribute to both the modernization of food heritage and the growing demands for food safety and health management.
To ensure a comprehensive and reproducible review, a systematic literature search was conducted. The search strategy focused on identifying peer-reviewed articles published in English within the past decade (2019–2025), primarily utilizing major academic databases such as Web of Science, Scopus, PubMed, and IEEE Xplore. Key search terms included combinations related to “image computing,” “computer vision,” “non-destructive detection,” “food recognition,” “nutrition estimation,” “hyperspectral imaging,” and “Chinese food/cuisine,” among others. The selection criteria prioritized studies demonstrating applications in the context of Chinese cuisine and its inherent complexities. A detailed description of the screening process and eligibility criteria is provided in the following methodology section.
2. Literature Search Methodology
This study employed a systematic literature review approach, adhering strictly to the PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines to ensure transparency, reproducibility, and methodological rigor throughout the search and selection process. The review focused on academic publications spanning from January 2014 to March 2025, with the goal of comprehensively capturing significant developments in the application of deep learning and hyperspectral imaging techniques for the detection of Chinese cuisine.
The search strategy encompassed both international core databases—Web of Science Core Collection, Scopus, IEEE Xplore, and PubMed—and leading Chinese-language databases, including CNKI (China National Knowledge Infrastructure) and Wanfang Data. In addition, to account for grey literature such as conference proceedings, preprints, and theses, supplementary searches were conducted via Google Scholar and Lens.org.
Given the regional diversity and technical complexity of Chinese cuisine, the search strategy was designed using Boolean operators (AND, OR, NOT) across three key conceptual clusters:
- Chinese Cuisine Features: e.g., Chinese cuisine, Mapo Tofu, dumpling texture, and regional flavor.
- Technical Approaches: e.g., hyperspectral imaging, spatial–spectral CNN, and attention mechanism.
- Application Objectives: e.g., non-destructive testing, food freshness, and allergen detection.
The search strings were pretested and refined to balance sensitivity and specificity, employing truncation and phrase searching where appropriate. A representative search expression is (Chinese dish OR dim sum) AND (deep learning OR CNN) AND (food safety OR oil content) *.
2.1. Inclusion and Exclusion Criteria
Studies were included if they met the following criteria: (1) focused on the detection or analysis of prepared Chinese dishes; (2) utilized image-based computational methods, preferably integrating hyperspectral imaging with deep learning; (3) reported experimental validation and quantitative metrics such as classification accuracy or compositional error; and (4) were published in peer-reviewed journals or conference proceedings.
Exclusion criteria were as follows: (1) studies limited to raw agricultural products; (2) studies relying on non-imaging methods such as biochemical assays; and (3) research unrelated to Chinese cuisine.
2.2. Screening Process
The screening process was conducted in two stages. In the first stage, two independent reviewers screened the titles and abstracts of 1842 records. In the second stage, the full texts of 106 potentially relevant articles were reviewed. Disagreements between reviewers were resolved by a third reviewer, with inter-rater agreement assessed using Cohen’s kappa (κ = 0.87). A total of 127 studies were ultimately included for quality assessment and data synthesis. The entire selection process is visually depicted in the PRISMA flow diagram.
2.3. Reference Management and Supplementary Searches
EndNote X9 was employed for reference management, supplemented by manual backward citation screening to minimize the risk of overlooking relevant studies. Additionally, to ensure comprehensive coverage of culturally specific terminology (e.g., 麻婆豆腐 [Mapo Tofu], 复合调味 [compound seasoning]), equivalent keyword searches were performed in the CNKI and Wanfang databases using Chinese-language terms.
3. Classification of Ordinary Dish Images
This chapter aims to systematically review the development of image classification techniques in the recognition of ordinary dish images, focusing on dataset evolution, key methods, and specific challenges encountered in recognizing Chinese cuisine. Compared to Western-style dishes, Chinese cuisine presents more complex image features in visual recognition tasks, which are primarily manifested in the following aspects: First, Chinese dishes exhibit a high degree of ingredient mixing, where a single dish often contains multiple ingredients with indistinguishable boundaries in images [26]. For instance, in dishes like “fish-flavored shredded pork” or “twice-cooked pork,” vegetables and meat are typically stir-fried together with similar colors and textures. Second, there is a lack of standardization in appearance. Due to chefs’ personal styles and regional variations, the same Chinese dish may demonstrate significant differences in shape, plating, and coloration, thereby increasing the difficulty of model generalization [27]. Third, Chinese cuisine predominantly includes soups and stewed dishes, which are characterized by strong surface reflections and loose structures, making feature extraction particularly challenging.
By comparing the effectiveness of different models and datasets, this section explores how various approaches address (or fail to address) the complexities inherent in food images—especially those of Chinese dishes—and points toward future research directions.
3.1. Established Dish Image Dataset
The rapid development of deep learning has made high-quality datasets crucial for improving model generalization and classification accuracy. In the food recognition domain, the diversity and representativeness of a dataset determine its applicability across different cuisines, cooking styles, and plating forms. This section reviews the evolution of food image datasets and their relevance to Chinese cuisine recognition.
Types and Evolution of Datasets
The foundational principles of deep learning are based on the ability of artificial neural networks to emulate the connectivity and functional mechanisms of neurons in the human brain [28], enabling the intelligent analysis of complex data types such as images, sounds, and text. During the training process, models utilize the backpropagation algorithm to iteratively adjust network weights, thereby minimizing the error between predicted and actual values. Simultaneously, activation functions introduce non-linear characteristics into the model, enabling the learning and representation of intricate patterns [29]. The effectiveness of these core mechanisms is highly dependent on the availability of large and diverse datasets. Deep learning models require extensive iterations and optimizations on such datasets to improve their generalization capabilities [30,31].
Data diversity is a critical determinant of the success of deep learning models. Rich and varied datasets enable models to capture multi-dimensional features and underlying patterns, thereby enhancing their performance in processing and classification tasks. Conversely, datasets with significant homogeneity or bias can lead to overfitting or poor generalization, which may constrain their practical applicability [32].
In recent years, the widespread adoption of smart devices and rapid advancements in internet technologies have led to an exponential increase in the volume of food image data, providing abundant resources to support deep learning applications in food recognition [33]. Table 2 presents several representative food image datasets developed in recent years, and Figure 1 illustrates the three primary computer vision tasks: classification, detection, and segmentation. These datasets encompass a diverse range of cuisines, including Japanese, Western, and Chinese dishes, thereby establishing a robust foundation for constructing efficient food image recognition models.
3.2. Image Classification Methods
3.2.1. Traditional Image Analysis Methods
In the 1980s, Professor Zayas, I. [44] made a pioneering contribution to the field of food image recognition by developing rule-based functions for image analysis, enabling the identification and differentiation of various wheat varieties. Building on this work, Lai, F.S. [45] introduced an image analysis technique leveraging pattern recognition, which facilitated the measurement and extraction of features from different grain types for classification purposes. These early approaches typically involved a multi-step pipeline: image preprocessing (e.g., denoising, edge enhancement), manual feature extraction (such as color histograms, texture analysis using gray-level co-occurrence matrices, or shape descriptors like Hu moments), and rule-based or statistical classification using thresholding, principal component analysis (PCA), or simple classifiers like K-nearest neighbors (KNN). These manually designed features proved effective for structured and uniform food categories such as grains and fruits, laying a robust foundation for subsequent research in food image recognition [46].
However, the visual complexity of Chinese cuisine—characterized by unstructured composition, overlapping ingredients, high intra-class variation, and regional presentation differences—poses unique challenges that traditional methods cannot effectively resolve. These approaches struggle to capture high-level semantic features and are easily affected by noise, background clutter, and variations in lighting and plating. As a result, more advanced, learning-based methodologies became necessary to address the intricacies of Chinese dish image classification [47].
3.2.2. The Rise of Deep Learning Methods
a. Convolutional Neural Networks (CNN).
Convolutional Neural Networks (CNNs) form the foundational architecture of modern deep learning, particularly in computer vision applications [48,49]. CNN-based models are generally classified into two main approaches: single-stage and two-stage methods [50,51]. Although conceptualized in the 1990s, the early adoption of CNNs was constrained by hardware limitations, particularly the lack of adequate computational resources such as Graphics Processing Units (GPUs) and the immaturity of supporting algorithms [52]. However, rapid advancements in hardware, most notably the widespread availability of GPUs, coupled with continuous improvements in deep learning algorithms, have propelled CNNs to the forefront of computer vision research [53,54].
In 2012, Krizhevsky et al. [55] introduced AlexNet, a groundbreaking CNN model that leveraged GPUs to significantly accelerate training processes. A critical innovation in their work was the emphasis on model depth, which proved essential for achieving superior performance in image classification tasks. The success of AlexNet, highlighted by its victory in the ImageNet competition, marked a transformative moment in artificial intelligence research [56]. An example of food image classification is presented in Figure 2, demonstrating the application of CNN-based models in distinguishing among various food types.
Building on these advancements, Riko Kusumoto and his team [57] extended the Bag-of-Features (BoF) model in 2014 by incorporating machine learning techniques based on sparse models and vector quantization. Their approach emphasized reconstructing local descriptors, significantly reducing the loss of image feature information and thereby improving feature extraction accuracy [58].
In 2017, Paritosh Pandey et al. [59] advanced CNN architectures by integrating AlexNet, GoogleNet, and ResNet into a novel multi-layer network. This innovative design exhibited exceptional performance on the ETH Food-101 dataset and a custom dataset focused on Indian cuisine [60], highlighting the effectiveness of combining diverse network architectures to enhance classification accuracy. Continuing this trajectory, in 2018, Martinel N and his team [61] developed a hybrid model that fused sliced convolutions with ResNet, specifically designed to capture vertical structural features in images of Western cuisine. Their model focused on accurately recognizing vertically structured dishes such as burgers, club sandwiches, multi-layered cakes, and lasagna—foods that pose particular classification challenges due to overlapping ingredients and inconsistent presentation. The proposed system was deployed in scenarios such as smart restaurant ordering systems, digital dietary tracking tools, and semi-automated kitchen monitoring, aiming to improve both accuracy and interpretability in real-world applications. Their approach achieved an impressive Top-1 accuracy of 90.27% on the Food-101 dataset, underscoring the potential of such innovations to significantly enhance image recognition performance [62]. Contemporary image classification methods primarily follow two competing paradigms: the CNN-based approach epitomized by ResNet and the Transformer-based approach pioneered by Vision Transformer (ViT). While demonstrating distinct characteristics in feature extraction, architecture design, and application domains, these approaches have recently converged through various hybrid architectures. They perform on the current largest food dataset, as shown in Table 3.
b. Object Detection and Semantic Segmentation.
In food-related scenarios, object detection and segmentation techniques enhance classification by distinguishing between overlapping and co-present food items. In agricultural domains, similar approaches have improved fruit detection, harvesting path planning, and crop row detection [77,78,79,80,81].
In nutrition monitoring, segmentation techniques enable precise identification of food components, supporting recipe generation and dietary guidance [82,83]. With the emergence of self-supervised models like SAM and BEIT, segmentation has become critical for enhancing food image interpretation (Figure 3).
c. Knowledge Distillation and Few-Shot Learning.
These approaches address computational and data scarcity challenges. Knowledge distillation transfers learning from large teacher models (e.g., ResNet-50) to smaller student models (e.g., VGG-16), preserving accuracy with lower complexity [84,85]. Few-shot learning helps classify new food types with minimal examples, ideal for dynamic or region-specific Chinese dishes.
d. Monocular Depth Estimation.
Volume estimation is critical for calorie assessment. Modern models use RGB-based monocular depth prediction via ViT, diffusion, and distillation models to infer food volumes, improving nutritional estimation over traditional 3D reconstruction techniques. Current state-of-the-art models achieve high precision in metric depth estimation (e.g., ZeroDepth [86] reduces scale ambiguity errors by 16.8% on the KITTI dataset, while Metric3D v2 [87] achieves 5% relative error without scale alignment). However, challenges persist in edge blurring and detail loss, especially for complex food geometries. For instance, PatchFusion improves resolution via tile-based fusion but requires 16–146× longer processing time than baseline methods, and diffusion-based models like Marigold suffer from temporal inconsistency in video applications [88,89].
3.2.3. Special Challenges and Solutions in Chinese Food Image Classification
Despite significant progress in food image recognition, applying these methods to Chinese cuisine remains challenging due to limitations in existing datasets. Most public datasets are built from web-crawled images, which introduce two major issues:
Cross-domain and cross-category noise. Images retrieved online often include packaged foods, raw ingredients, or miscategorized dishes—introducing semantic noise. Many labeled categories contain visually inconsistent or irrelevant samples.
Figure 4 left shows how noise affects typical categories like stir-fried cabbage or king oyster mushrooms. Moreover, dishes with the same name can look drastically different due to cooking variations or angles, leading to cross-category confusion (Figure 4 right).
High redundancy and low quality Many web images are near-duplicates, inflating dataset size without adding meaningful diversity. In addition, issues such as background clutter, lighting variation, and low resolution further hinder feature extraction.
Lack of regional diversity Chinese cuisine is deeply regional. Existing datasets often reflect narrow or localized samples, causing deep models to overfit superficial features rather than learn intrinsic visual patterns.
These limitations directly impact classification accuracy on Chinese-specific benchmarks, as shown in Table 4.
To overcome these challenges, future work should do the following: curate high-quality, regionally diverse datasets with expert labeling; apply noise filtering and de-duplication techniques during data preparation; use robust models that incorporate semantic segmentation or contextual learning; and explore few-shot learning for underrepresented or variant-rich dishes. These directions will improve recognition performance and make AI systems more adaptable to the rich diversity of Chinese cuisine.
4. Hyperspectral Imaging
Hyperspectral imaging (HSI) has emerged as a powerful non-destructive technique for food analysis, owing to its capacity to extract spectral–spatial fusion features that allow both qualitative and quantitative insights [96]. In this section, we explore the technical foundations, application scenarios, deep learning integration, and remaining challenges of HSI within the context of food—particularly Chinese cuisine. Rather than merely describing imaging technologies, this section aims to reveal how hyperspectral analysis resolves limitations of traditional food recognition while identifying key bottlenecks and future directions for intelligent food inspection [97].
4.1. Hyperspectral Imaging Techniques
Hyperspectral imaging has emerged as a powerful modality in food image analysis, offering advantages that go far beyond traditional imaging techniques. By capturing dense spectral information across the visible and near-infrared range, HSI enables precise identification of food attributes such as freshness, ripeness, moisture content, fat distribution, and contamination—attributes often invisible to the human eye or conventional RGB cameras [98]. This makes it particularly valuable for applications including non-destructive quality inspection, early spoilage detection, adulteration screening, and intelligent sorting in production lines. The integration of spectral and spatial features not only enhances classification accuracy but also enables pixel-level analysis [99], which is critical for assessing heterogeneous or visually similar food products. These capabilities have positioned HSI as a key enabler of intelligent, automated, and data-driven decision-making in modern agri-food systems.
4.1.1. Hyperspectral Imaging Equipment and Technical Principles
Hyperspectral imaging captures three-dimensional data cubes (X-Y spatial axes and Z spectral axis), enabling highly detailed material analysis based on each substance’s spectral “fingerprint.” These fingerprints are formed through substance-specific reflection and absorption characteristics [100]—such as myoglobin oxidation peaks (660 nm) or chlorophyll absorption valleys (680 nm)—which serve as the physical foundation for non-invasive compositional detection [101].
To separate spectral components, systems use dispersive (high resolution), filtering (flexible), or interferometric (high SNR) methods. Most commercial systems adopt push-broom scanning, where line-array detectors acquire synchronized spectral slices during object displacement.
A typical HSI system comprises an optical module, detector, and data processing unit (Figure 5, left). In optical design, dispersive elements account for 30–50% of the system volume and require collaborative optimization with collimating and focusing lenses to enhance optical path efficiency [102]. The illumination module employs halogen lamps (covering 400–2500 nm broadband) or LED arrays (narrowband tunable) to ensure uniform lighting. Detector performance directly impacts detection sensitivity: silicon-based CCD/CMOS and InGaAs combinations are widely used in visible-shortwave infrared bands, achieving spatial resolutions up to 5 μm [103]. Push-broom scanning mode (adopted by 80% of commercial systems) dominates data acquisition, synchronizing line-array detectors with displacement stages to capture spatial–spectral information [104]. Snapshot techniques (e.g., coded aperture) improve frame rates but sacrifice resolution. Data processing involves three stages: radiometric correction (eliminating light source fluctuations), geometric correction (spatial registration error <0.1 pixel), and spectral unmixing (endmember extraction error <5%), supported by high-precision displacement stages (±0.1 mm accuracy) and large-capacity storage systems (single-scan data volume up to 200 GB) for large-scale experiments.
4.1.2. Spectral Band Functionality and Food Inspection Applications
HSI enables functional food analysis across visible to shortwave infrared (400–2500 nm), where different bands correlate with specific physical or chemical attributes: Visible (400–760 nm): Blue (450–495 nm): detects foreign objects via reflectance differences. Green (495–570 nm): assesses chlorophyll peaks for vegetable freshness. Red (620–700 nm): identifies bruises or meat browning using myoglobin absorption.
Near to shortwave infrared (760–2500 nm): 780, 1450, and 1940 nm (OH bands): track moisture migration in baked goods. Values of 1724 and 1762 nm (CH_2_): detect lipid oxidation in meat. Values within the range of 1500–2300 nm (NH/CH): map proteins, starches, and carbohydrates.
These targeted spectral bands provide a solid engineering basis for building portable, real-time food inspection devices.
4.1.3. Data Processing and Classification Enhancement
Unlike RGB-based methods, HSI excels in distinguishing visually similar objects through subtle spectral variances [105]. Enhanced data preprocessing—such as Savitzky–Golay smoothing or spectral unmixing—is essential for maintaining feature integrity. For instance, SG filtering with a 7–11 window achieves 92.3% feature retention under 40 dB SNR. Figure 5 (right) shows the improvement of spectral curve quality after smoothing, supporting more robust classification.
4.2. Deep Learning Approaches Based on Convolutional Neural Networks
4.2.1. Background on Hyperspectral Analysis with CNNs
The success of AlexNet marked a pivotal shift in hyperspectral analysis, establishing convolutional neural networks (CNNs) as a foundational deep learning tool in this domain [106,107]. Early CNN-based models employed one-dimensional convolution layers, Batch Normalization, and PReLU activation to effectively extract spectral features, achieving promising classification accuracy even under limited training data conditions [108].
4.2.2. Challenges in Modeling Complex Food Structures
While deep learning techniques have demonstrated strong performance in capturing spectral distinctions among food components, a critical limitation persists in modeling the underlying mechanisms that govern the discrimination of complex food structures. Specifically, many food products comprise multiple coexisting constituents—such as lipids, carbohydrates, and proteins—whose spectral features often exhibit significant overlap and nonlinear mixing [109]. In such cases, the spectral signatures do not correspond to isolated compounds but to complex, spatially intertwined matrices, making the interpretation of learned features inherently ambiguous. Current models predominantly rely on data-driven correlations without explicit consideration of biochemical interactions or physicochemical dependencies between constituents. As a result, although models may achieve high classification accuracy at the macro level, they often lack interpretability and robustness in tasks requiring fine-grained differentiation, such as estimating fat–protein ratios in emulsified products or distinguishing carbohydrate layers in cooked or processed foods.
Moreover, the low spatial resolution of hyperspectral data and the presence of spectral–spatial redundancy further complicate the accurate parsing of constituent-specific information. Existing fusion-based approaches—while improving robustness—do not fully resolve these ambiguities, particularly in scenarios involving heterogeneous, multi-phase food matrices. Therefore, the mechanistic basis by which deep networks differentiate among closely related or mixed spectral components remains an open problem, warranting further research in model interpretability, multi-modal integration, and constituent-level feature disentanglement.
4.2.3. Representative Fusion Models
The low spatial resolution of hyperspectral data, combined with spectral–spatial redundancy, complicates the accurate extraction of constituent-specific information. Although existing fusion-based approaches have enhanced robustness, they fail to fully address these challenges, especially in scenarios involving heterogeneous, multi-phase food matrices [110]. As a result, the fundamental mechanisms by which deep learning networks distinguish between closely related or mixed spectral components remain unclear, highlighting the need for further research into model interpretability, multi-modal integration, and the disentangling of constituent-level features.
To address these limitations, several spectral–spatial fusion models have been proposed, offering improvements in segmentation and classification tasks by associating spectral features (e.g., moisture, lipid content) with spatial patterns within food matrices [111]. However, their performance is still limited in highly complex or mixed food systems, signaling a clear need for biologically informed and interpretable modeling strategies. Figure 6 illustrates a comparison of HSI region segmentation between remote sensing tasks (left) and food applications (right), emphasizing the adaptability of these techniques across domains.
4.3. Challenges and Future Research Directions
Despite its potential, hyperspectral imaging faces several technical bottlenecks in food applications:
Data and computation efficiency: High-dimensional data increases memory and runtime requirements. Reducing algorithmic complexity while maintaining accuracy remains an urgent goal.
Training sample limitations: Many deep models rely on large, labeled datasets, which are difficult to obtain in food scenarios. Research into few-shot learning, data augmentation, and transfer learning is essential.
Model sparsity and deployment: lightweight models with high sparsity and robust performance are necessary for real-world use in handheld or embedded devices.
Classifier innovation: integrating multiple classifiers or novel architectures could enhance accuracy and stability in complex food environments.
Future work must balance precision, efficiency, and generalization to move hyperspectral food recognition from lab settings toward scalable, real-time deployment.
5. Application of Hyperspectral Technology in Food Inspection
Hyperspectral imaging (HSI) has become a transformative tool in modern food inspection due to its ability to conduct non-destructive, real-time, and composition-sensitive analysis. By capturing rich spectral–spatial information, HSI goes beyond traditional surface imaging and offers robust solutions for food classification, quality control, and safety monitoring. This section explores both foundational and emerging applications of HSI in food detection, emphasizing its integration with AI and the trajectory for future innovation.
5.1. Foundations of Hyperspectral Technology in Food Detection
5.1.1. Principles and Analytical Methods
At its core, HSI measures the spectral reflectance of materials, generating rich datasets that support both qualitative identification (e.g., variety classification) and quantitative evaluation (e.g., moisture or sugar content). This spectral-based approach enables inspection without damaging the food, making it well-suited for scenarios such as type recognition, freshness grading, and quality assessment.
Figure 7 illustrates major use cases where HSI has been deployed in non-destructive food detection systems—from surface defect detection to internal spoilage assessment.
5.1.2. Empirical Applications
Hyperspectral imaging has been widely applied in diverse food detection scenarios, owing to its ability to extract rich spectral–spatial features non-destructively. For instance, it has been used to classify different varieties of vinegar and corn seeds [112] based on subtle spectral differences. Wang Jun proposed the Residual Attention Hierarchical Regression Network (RA-HRNet), which enhanced image reconstruction performance while reducing computational complexity, achieving 94.7% accuracy in identifying brewing sorghum varieties [113].
In food quality inspection, hyperspectral imaging has enabled accurate prediction of deoxynivalenol (DON) contamination levels in wheat flour [114], estimation of soluble solid content (SSC) in apples [115], and evaluation of sugar and moisture content in snow pears [116]. The technology has also proven effective in diagnosing nutrient deficiencies in plants, such as nitrogen, phosphorus, and potassium imbalance in tomato leaves [117].
Freshness discrimination is another key application area. Near-infrared hyperspectral imaging has demonstrated high accuracy in detecting internal mold in peanuts and differentiating freshness levels in pork, beef, and fish products based on biochemical markers such as myoglobin oxidation [118]. A detailed summary of these applications is presented in Table 5.
5.2. Emerging Applications Enabled by Hyperspectral and AI Technologies
Although hyperspectral imaging (HSI) has been widely applied in conventional food quality inspection, recent advances in artificial intelligence (AI), sensor fusion, and high-resolution imaging have significantly expanded its scope. These developments have enabled novel applications in intelligent food analysis, particularly in the prediction of nutritional value and spoilage levels in complex food matrices.
5.2.1. Nutritional Monitoring: Semantic Segmentation and Deep Estimation
Deep learning algorithms, particularly those based on semantic segmentation, have facilitated fine-grained, pixel-level analysis of heterogeneous food compositions. Models such as the residual U-Net architecture have demonstrated efficacy in differentiating between food categories (e.g., meats, vegetables, and cereals) within a single dish. By isolating and characterizing each component, these models allow for accurate nutritional profiling.
Advanced systems—such as the CNTA’s “Food Safety 4.0” platform—combine HSI with deep learning to infer macronutrient distributions in real time. Empirical results suggest that such systems achieve up to a 40% reduction in estimation error compared to conventional image-based calorie estimation approaches. In addition, the integration of 3D food reconstruction and real-time nutritional databases enables more accurate caloric assessments, with demonstrated accuracy in estimating values (e.g., 215 kcal for mixed portions) even in diverse meal contexts.
Nonetheless, these technologies face persistent challenges in analyzing complex dishes, particularly those with overlapping or occluded ingredients. Such scenarios introduce spectral mixing and shape distortion, complicating the isolation of individual food items. Ongoing research addresses these limitations through the application of multi-spectral or hyper-multi-spectral fusion techniques, aiming to improve discrimination accuracy under non-ideal conditions.
5.2.2. Intelligent Storage and Spoilage Detection
Recent innovations in smart food storage leverage hyperspectral and multi-sensor technologies for dynamic monitoring of food degradation processes. Smart refrigerators, such as the Meiling CHiQ series, employ image recognition systems and embedded databases to identify over 500 food types, track storage durations, and generate real-time spoilage alerts. Other systems, like those developed by Hisense, incorporate RFID tagging and load-cell-based weight sensors for automatic inventory tracking and lifecycle prediction.
From a biochemical standpoint, HSI enables non-destructive quantification of spoilage indicators—such as chlorophyll degradation in vegetables or protein breakdown in meats—by detecting subtle changes in reflectance spectra associated with moisture loss, microbial activity, and oxidation. Despite these advances, accurately assessing the freshness of unpackaged or non-standardized foods remains a technical hurdle. Current research focuses on machine learning-enhanced spectral interpretation, which seeks to improve model robustness across variable conditions, including lighting, packaging interference, and food heterogeneity. Figure 8 presents representative portable hyperspectral detection devices used in supporting these freshness and storage systems.
5.2.3. Personalized Diets and Automated Serving Robots
Existing cross-modal systems leverage CNNs and large food databases to provide personalized dietary recommendations. For example, HealthBenefit’s system identifies dish components and tailors nutrition advice to individual health profiles.
Current AI applications, such as Cal AI’s system, use image recognition and LLMs to generate health-specific recipes. Robotics in this domain has progressed from simple tasks to full-process automation, with developed systems like Sweeper and LAVA achieving high precision in harvesting and cooking tasks.
5.3. Summary, Limitations, and Future Directions
Hyperspectral imaging has substantially enhanced food inspection capabilities, enabling high-precision detection of composition and freshness, intelligent food tracking, and personalized health interventions. Its integration with AI facilitates context-aware, data-driven decision-making across the entire food lifecycle.
However, several technical and practical challenges persist: Data limitations: existing datasets remain small, insufficiently diverse, or biased toward specific food types and imaging conditions. Computational burden: current models are resource-intensive, posing deployment challenges in edge and embedded environments. Limited model generalization: performance often deteriorates under real-world conditions, such as complex plating, poor lighting, and ingredient occlusion.
To address these challenges, future research should focus on the following: Expanding hyperspectral datasets to include richer annotations, a broader range of food categories, and more representative imaging conditions. Integrating multi-modal sensing technologies, such as NIR, Raman, thermal, and depth imaging, to enrich data diversity and improve recognition accuracy. Developing lightweight models capable of efficient inference on mobile and embedded systems. Exploring omics-level data fusion, combining spectral information with genomic, metabolomic, or microbiome data to enable truly personalized nutrition interventions.
Looking forward, advancements in food image recognition—particularly for complex and culturally rich cuisines like Chinese food—should prioritize the following key directions:
- (1)Construction of Multimodal Datasets: To better capture the diversity and cultural context of Chinese cuisine, future datasets should integrate heterogeneous data types, including spectral and RGB images, nutritional information, textual labels (e.g., ingredient lists, dish names), and regional or cultural annotations. Developing automated annotation systems will be crucial to reducing labeling costs and facilitating the creation of large-scale, high-quality datasets that are both representative and generalizable across geographic regions.
- (2)Optimization of Deep Learning Models: Achieving high classification accuracy under real-world constraints requires models that balance precision with efficiency. Research should emphasize the following: Lightweight neural networks and sparse classifiers suitable for mobile and embedded devices. Efficient spectral–spatial feature fusion techniques. Few-shot and data-efficient learning algorithms that perform well with limited samples.
Such models will enhance usability in resource-constrained environments, including handheld devices and low-power consumer electronics.
(3)Advanced Applications of Hyperspectral Technology: Beyond classification, hyperspectral imaging should be further leveraged for food safety and health monitoring, particularly in detecting trace elements, heavy metals, pesticide residues, and foodborne allergens or contaminants. These expanded applications would significantly enhance the practical impact of HSI in daily food inspection and public health assurance.(4)Development of Cross-Cultural and Generalizable Models: To support global food AI applications, recognition systems must adapt across regions and cultures. This necessitates the following: Aggregating datasets representing diverse food traditions. Training models on culturally heterogeneous data. Incorporating knowledge transfer techniques to bridge gaps between different cuisines
Such efforts will facilitate the digital preservation and global dissemination of Chinese culinary heritage while also enabling cross-border applications in nutrition research, ingredient authentication, and dietary personalization.
In conclusion, this review underscores that food image recognition, particularly in the context of Chinese cuisine, requires solutions that transcend traditional visual modeling. As multimodal data integration, AI model optimization, and hyperspectral sensing technologies continue to mature, they will form a comprehensive framework for intelligent food analysis. With ongoing research into data diversity, algorithmic innovation, and cross-domain deployment, these technologies will not only enable smarter dietary recommendations and enhanced food safety management but also offer new methodologies for related domains such as ingredient quality grading, freshness monitoring, and cooking process recognition. Ultimately, these advancements will contribute to healthier lifestyles, safer food systems, and a deeper scientific understanding of food.
6. Conclusions
This review has systematically examined the evolution and current landscape of food image recognition technologies, with a particular emphasis on the classification and detection of Chinese cuisine images. The discussion traced progress from early RGB-based deep learning frameworks—such as convolutional neural networks (CNNs) and transfer learning models—to emerging hyperspectral imaging (HSI) approaches that aim to overcome inherent limitations in identifying visually complex, culturally diverse, and compositionally intricate food items.
While RGB-based methods have achieved moderate success in structured classification tasks, their performance remains constrained by several key factors, including high visual similarity between distinct dishes, sensitivity to environmental variables (e.g., lighting, occlusion), and limited generalization across dish variants. These limitations are especially pronounced in the context of Chinese cuisine, which is characterized by a vast array of regional styles, diverse ingredient combinations, and nuanced preparation techniques.
Hyperspectral imaging, with its capacity to capture rich spectral signatures at the material level, provides a non-destructive means to analyze food properties such as freshness, nutritional composition, and physicochemical quality. The fusion of HSI with advanced deep learning techniques—including knowledge distillation, few-shot learning, and spectral–spatial feature extraction—has significantly enhanced classification accuracy and expanded the applicability of food recognition systems beyond laboratory conditions.
Nevertheless, several challenges remain. These include the scarcity of large-scale, annotated hyperspectral food datasets, the computational demands of high-dimensional spectral modeling, and the discrepancy between controlled experimental settings and complex real-world deployment scenarios. In particular, Chinese cuisine presents unique challenges due to its ingredient-level heterogeneity, frequent use of mixed and overlapping components, and the prevalence of sauces and thermal transformations that obscure visual and spectral cues. The intrinsic complexity of such dishes—where a single plate may comprise multiple ingredients, each undergoing different preparation processes—renders conventional recognition and detection algorithms insufficient. Addressing these challenges will require more robust, context-aware models that can disentangle overlapping signals and adapt to the dynamic, culturally embedded nature of food presentation in Chinese culinary contexts.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chang Y. Research on Dietary Nutrition and Health Issues Among College Students J. Heilongjiang Coll. Educ.201433195196
- 2Estay K. Proserpio C. Cattaneo C. Laureati M. Children’s food neophobia across different socioeconomic backgrounds in Chile: Exploring acceptance and willingness to try unfamiliar vegetables Food Qual. Preference 202512910551110.1016/j.foodqual.2025.105511 · doi ↗
- 3Liu Y. Liu C. Sun L. Li M. Zhu Y. Deng W. Yu J. Zhang W. Song Z. Investigating flavor and quality characteristics in Chinese bacon from different regions using integrated GC-IMS, electronic sensory assessment, and sensory analysis Meat Sci.202522010970910.1016/j.meatsci.2024.10970939549429 · doi ↗ · pubmed ↗
- 4Ding H. Tian J. Yu W. Wilson D.I. Young B.R. Cui X. Xin X. Wang Z. Li W. The application of artificial intelligence and big data in the food industry Foods 202312451110.3390/foods 1224451138137314 PMC 10742996 · doi ↗ · pubmed ↗
- 5Namkhah Z. Fatemi S.F. Mansoori A. Nosratabadi S. Ghayour-Mobarhan M. Sobhani S. Advancing sustainability in the food and nutrition system: A review of artificial intelligence applications Front. Nutr.202310129524110.3389/fnut.2023.129524138035357 PMC 10687214 · doi ↗ · pubmed ↗
- 6Raki H. Aalaila Y. Taktour A. Peluffo-Ordóñez D. Combining AI tools with non-destructive technologies for crop-based food safety: A comprehensive review Foods 2023131110.3390/foods 1301001138201039 PMC 10777928 · doi ↗ · pubmed ↗
- 7Min W. Jiang S. Liu L. Rui Y. Jain R. A survey on food computing ACM Comput. Surv.20195213610.1145/3329168 · doi ↗
- 8Shen C. Wang R. Nawazish H. Wang B. Cai K. Xu B. Machine vision combined with deep learning–based approaches for food authentication: An integrative review and new insights Compr. Rev. Food Sci. Food Saf.202423 e 7005410.1111/1541-4337.7005439530613 · doi ↗ · pubmed ↗
