Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear
Seonwhee Jin

TL;DR
This paper introduces RveRNet, a novel architecture combining ROI and global context modules, enhanced by knowledge distillation from CNNs, to improve food image classification accuracy, especially for ambiguous cases.
Contribution
The paper proposes RveRNet, a new architecture that integrates ROI and global context modules, and demonstrates its effectiveness with knowledge distillation from CNNs for food classification.
Findings
RveRNet achieved a 10% higher F1 score than individual models.
Knowledge distillation from CNNs improved DeiT's robustness.
Global context modules enhanced classification of ambiguous images.
Abstract
Models based on convolutional neural networks (CNN) and transformers have steadily been improved. They also have been applied in various computer vision downstream tasks. However, in object detection tasks, accurately localizing and classifying almost infinite categories of foods in images remains challenging. To address these problems, we first segmented the food as the region-of-interest (ROI) by using the segment-anything model (SAM) and masked the rest of the region except ROI as black pixels. This process simplified the problems into a single classification for which annotation and training were much simpler than object detection. The images in which only the ROI was preserved were fed as inputs to fine-tune various off-the-shelf models that encoded their own inductive biases. Among them, Data-efficient image Transformers (DeiTs) had the best classification performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · 3D Surveying and Cultural Heritage
MethodsDense Connections · Softmax · Feedforward Network · Linear Layer · Attention Dropout · Dropout · Multi-Head Attention · Attention Is All You Need · Data-efficient Image Transformer · Knowledge Distillation
