Machine learning and natural language processing models to predict the extent of food processing
Nalin Arora, Sumit Bhagat, Riya Dhama, Ganesh Bagler

TL;DR
This study develops machine learning and NLP models to accurately predict the level of food processing using nutrient profiles, aiding public health efforts to identify ultra-processed foods.
Contribution
It introduces integrated ML, deep learning, and NLP models that utilize nutrient data to classify food processing levels, including a user-friendly web server for practical application.
Findings
Best models achieved F1-scores above 0.93.
Nutrient panels of 13-102 features yield high prediction accuracy.
NLP models demonstrated state-of-the-art performance.
Abstract
The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Industry and Aquatic Biology
