# Seeing What’s on the Plate: Composition-Aware Fine-Grained Food Recognition for Dietary Analysis

**Authors:** Linghui Ye, Qingbing Sang, Zhiyong Xiao

PMC · DOI: 10.3390/foods15050931 · Foods · 2026-03-06

## TL;DR

This paper introduces a new framework for accurately recognizing food in images, focusing on subtle differences in composition to support dietary analysis and health monitoring.

## Contribution

A novel fine-grained food classification framework that enhances spatial relation modeling and key-region awareness for reliable dietary analysis.

## Key findings

- The framework achieves 82.28% accuracy on the FoodX-251 dataset and 82.64% on the UEC Food-256 dataset.
- It enables stable recognition of food categories under real-world variations in appearance, viewpoint, and background.
- The approach improves discrimination among visually similar dishes with different ingredient compositions.

## Abstract

Reliable visual characterization of food composition is a fundamental prerequisite for image-based dietary assessment and health-oriented food analysis. In fine-grained food recognition, models often suffer from large intra-class variation and small inter-class differences, where visually similar dishes exhibit subtle yet discriminative differences in ingredient compositions, spatial distribution, and structural organization, which are closely associated with different nutritional characteristics and health relevance. Capturing such composition-related visual structures in a non-invasive manner remains challenging. In this work, we propose a fine-grained food classification framework that enhances spatial relation modeling and key-region awareness to improve discriminative feature representation. The proposed approach strengthens sensitivity to composition-related visual cues while effectively suppressing background interference. A lightweight multi-branch fusion strategy is further introduced for the stable integration of heterogeneous features. Moreover, to support reliable classification under large intra-class variation, a token-aware subcenter-based classification head is designed. The proposed framework is evaluated on the public FoodX-251 and UEC Food-256 datasets, achieving accuracies of 82.28% and 82.64%, respectively. Beyond benchmark performance, the framework is designed to support practical image-based dietary analysis under real-world dining conditions, where variations in appearance, viewpoint, and background are common. By enabling stable recognition of the same food category across diverse acquisition conditions and accurate discrimination among visually similar dishes with different ingredient compositions, the proposed approach provides reliable food characterization for dietary interpretation, thereby supporting practical dietary monitoring and health-oriented food analysis applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12985250/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12985250/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12985250/full.md

---
Source: https://tomesphere.com/paper/PMC12985250