Knowledge distillation to effectively attain both region-of-interest and   global semantics from an image where multiple objects appear

Seonwhee Jin

arXiv:2407.08257·cs.CV·July 12, 2024

Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear

Seonwhee Jin

PDF

Open Access 1 Repo

TL;DR

This paper introduces RveRNet, a novel architecture combining ROI and global context modules, enhanced by knowledge distillation from CNNs, to improve food image classification accuracy, especially for ambiguous cases.

Contribution

The paper proposes RveRNet, a new architecture that integrates ROI and global context modules, and demonstrates its effectiveness with knowledge distillation from CNNs for food classification.

Findings

01

RveRNet achieved a 10% higher F1 score than individual models.

02

Knowledge distillation from CNNs improved DeiT's robustness.

03

Global context modules enhanced classification of ambiguous images.

Abstract

Models based on convolutional neural networks (CNN) and transformers have steadily been improved. They also have been applied in various computer vision downstream tasks. However, in object detection tasks, accurately localizing and classifying almost infinite categories of foods in images remains challenging. To address these problems, we first segmented the food as the region-of-interest (ROI) by using the segment-anything model (SAM) and masked the rest of the region except ROI as black pixels. This process simplified the problems into a single classification for which annotation and training were much simpler than object detection. The images in which only the ROI was preserved were fed as inputs to fine-tune various off-the-shelf models that encoded their own inductive biases. Among them, Data-efficient image Transformers (DeiTs) had the best classification performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seonwhee-genome/rvernet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · 3D Surveying and Cultural Heritage

MethodsDense Connections · Softmax · Feedforward Network · Linear Layer · Attention Dropout · Dropout · Multi-Head Attention · Attention Is All You Need · Data-efficient Image Transformer · Knowledge Distillation