DKTransformer: An Accurate and Efficient Model for Fine-Grained Food Image Classification
Hongjuan Wang, Chenxi Wang, Xinjun An

TL;DR
DKTransformer is a new model that improves accuracy and efficiency in classifying detailed food images by combining Vision Transformers and CNNs.
Contribution
Proposes DKTransformer, a hybrid model combining ViT and CNN with novel modules for efficient fine-grained food classification.
Findings
DKTransformer achieves 92.71% Top-1 accuracy on ETH Food-101 with 47 M parameters and 7.21 G FLOPs.
It reaches 90.70% accuracy on Vireo-Food-172 and 66.89% on Food-500, showing strong generalization.
The model balances accuracy and efficiency for complex food image classification tasks.
Abstract
With the rapid development of dietary analysis and health computing, food image classification has attracted increasing attention. However, this task remains challenging due to the fine-grained nature of food categories. Different classes are visually similar, whereas samples within the same class exhibit large appearance variations. Existing methods often rely excessively on either global or local features, limiting their effectiveness in complex food scenes. To address these challenges, this paper proposes DKTransformer, a lightweight hybrid architecture that combines Vision Transformers (ViT) and convolutional neural networks (CNNs) for fine-grained food image classification. Specifically, DKTransformer introduces a Local Feature Extraction (LDE) module based on depthwise separable convolution to enhance local detail modeling. Furthermore, a Multi-Scale Dilated Attention (MSDA)…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNutritional Studies and Diet · Spectroscopy and Chemometric Analyses · Advanced Chemical Sensor Technologies
