# Utilizing Multimodal Logic Fusion to Identify the Types of Food Waste Sources

**Authors:** Dong-Ming Gao, Jia-Qi Song, Zong-Qiang Fu, Zhi Liu, Gang Li

PMC · DOI: 10.3390/s26030851 · Sensors (Basel, Switzerland) · 2026-01-28

## TL;DR

This paper introduces a multimodal system that combines vision and audio to accurately identify food waste sources in industrial settings, even under poor lighting conditions.

## Contribution

The novel environment-aware logic fusion framework dynamically switches between vision and audio models based on real-time lighting conditions.

## Key findings

- The MobileNetV3 + EMA vision model achieved 99.46% accuracy under ideal lighting (120–240 cd m−2).
- The multimodal fusion strategy improved classification accuracy by 39.5% in low-light conditions (12 cd m−2).
- Audio recognition maintained a stable 0.80 accuracy in low-light scenarios, serving as a reliable fallback.

## Abstract

What are the main findings?
The MobileNetV3 + EMA vision model achieved a peak accuracy of 99.46% under ideal lighting (120–240 cd m−2), while the multimodal logic fusion strategy improved classification accuracy by 39.5% in low-light conditions (12 cd m−2).Audio recognition using Fast Fourier Transform (FFT) and Support Vector Machine (SVM) maintained a stable accuracy of 0.80, serving as a reliable fallback when illumination conditions caused visual recognition to fail.

The MobileNetV3 + EMA vision model achieved a peak accuracy of 99.46% under ideal lighting (120–240 cd m−2), while the multimodal logic fusion strategy improved classification accuracy by 39.5% in low-light conditions (12 cd m−2).

Audio recognition using Fast Fourier Transform (FFT) and Support Vector Machine (SVM) maintained a stable accuracy of 0.80, serving as a reliable fallback when illumination conditions caused visual recognition to fail.

What are the implications of the main findings?
The proposed environment-aware logic fusion framework effectively solves the problem of visual model failure caused by significant lighting fluctuations in 24/7 industrial food waste processing.Accurate real-time identification of waste textures enables the automated adjustment of equipment operating parameters, distinguishing between kitchen waste (requiring high pressure) and leftovers (requiring low pressure).

The proposed environment-aware logic fusion framework effectively solves the problem of visual model failure caused by significant lighting fluctuations in 24/7 industrial food waste processing.

Accurate real-time identification of waste textures enables the automated adjustment of equipment operating parameters, distinguishing between kitchen waste (requiring high pressure) and leftovers (requiring low pressure).

It is a challenge to identify food waste sources in all-weather industrial environments, as variable lighting conditions can compromise the effectiveness of visual recognition models. This study proposes and validates a robust, interpretable, and adaptive multimodal logic fusion method in which sensor dominance is dynamically assigned based on real-time illuminance intensity. The method comprises two foundational components: (1) a lightweight MobileNetV3 + EMA model for image recognition; and (2) an audio model employing Fast Fourier Transform (FFT) for feature extraction and Support Vector Machine (SVM) for classification. The key contribution of this system lies in its environment-aware conditional logic. The image model MobileNetV3 + EMA achieves an accuracy of 99.46% within the optimal brightness range (120–240 cd m−2), significantly outperforming the audio model. However, its performance degrades significantly outside the optimal range, while the audio model maintains an illumination-independent accuracy of 0.80, a recall of 0.78, and an F1 score of 0.80. When light intensity falls below the threshold of 84 cd m−2, the audio recognition results take precedence. This strategy ensures robust classification accuracy under variable environmental conditions, preventing model failure. Validated on an independent test set, the fusion method achieves an overall accuracy of 90.25%, providing an interpretable and resilient solution for real-world industrial deployment.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899810/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899810/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899810/full.md

---
Source: https://tomesphere.com/paper/PMC12899810