VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout
Md. Istiak Hossain Shihab, Nazia Tasnim, Hasib Zunair, Labiba Kanij, Rupty, Nabeel Mohammed

TL;DR
This paper presents a novel approach combining Vision Transformers, U-Net segmentation, and image colorfulness filtering to improve product recognition in automated retail checkout, addressing occlusion, domain bias, and fast motion challenges.
Contribution
It introduces a unified segmentation and classification framework using ViT and entropy masking, specifically designed for real-world retail scenarios with domain bias and occlusions.
Findings
Achieved 3rd place in AI City Challenge 2022 with an F1 score of 0.4545
Developed a custom frame filtering metric to discard irrelevant frames
Demonstrated effectiveness of combined segmentation, classification, and filtering methods
Abstract
Multi-class product counting and recognition identifies product items from images or videos for automated retail checkout. The task is challenging due to the real-world scenario of occlusions where product items overlap, fast movement in the conveyor belt, large similarity in overall appearance of the items being scanned, novel products, and the negative impact of misidentifying items. Further, there is a domain bias between training and test sets, specifically, the provided training dataset consists of synthetic images and the test set videos consist of foreign objects such as hands and tray. To address these aforementioned issues, we propose to segment and classify individual frames from a video sequence. The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking to address the domain bias problem. The multi-class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image and Object Detection Techniques · Advanced Neural Network Applications
