Rethinking Cooking State Recognition with Vision Transformers

Akib Mohammed Khan; Alif Ashrafee; Reeshoon Sayera; Shahriar Ivan; and; Sabbir Ahmed

arXiv:2212.08586·cs.CV·March 7, 2023

Rethinking Cooking State Recognition with Vision Transformers

Akib Mohammed Khan, Alif Ashrafee, Reeshoon Sayera, Shahriar Ivan, and, Sabbir Ahmed

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Vision Transformer-based approach for cooking state recognition in kitchen environments, leveraging global attention and transfer learning to significantly improve accuracy over previous methods.

Contribution

It applies Vision Transformers with transfer learning and data augmentation to enhance cooking state recognition, achieving state-of-the-art performance.

Findings

01

Achieved 94.3% accuracy on the Cooking State Recognition Challenge Dataset.

02

Outperformed existing state-of-the-art methods.

03

Demonstrated the effectiveness of global attention in distinguishing similar cooking states.

Abstract

To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been extensively studied, the task of object state classification has remained relatively unexplored. The high intra-class similarity of ingredients during different states of cooking makes the task even more challenging. Researchers have proposed adopting Deep Learning based strategies in recent times, however, they are yet to achieve high performance. In this study, we utilized the self-attention mechanism of the Vision Transformer (ViT) architecture for the Cooking State Recognition task. The proposed approach encapsulates the globally salient features from images, while also exploiting the weights learned from a larger dataset. This global attention allows the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlifAshrafee/ViT-pytorch-for-Cooking-State-Recognition
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Chemical Sensor Technologies · Nutritional Studies and Diet

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Residual Connection · Label Smoothing · Adam