Are Deep Learning Models Robust to Partial Object Occlusion in Visual   Recognition Tasks?

Kaleb Kassaw; Francesco Luzi; Leslie M. Collins; Jordan M. Malof

arXiv:2409.10775·cs.CV·September 18, 2024

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Kaleb Kassaw, Francesco Luzi, Leslie M. Collins, Jordan M. Malof

PDF

Open Access

TL;DR

This paper introduces the IRUO dataset to benchmark deep learning models' robustness to partial object occlusion, revealing that ViT models outperform CNNs and approach human accuracy, especially under certain occlusion types.

Contribution

The paper presents the IRUO dataset for evaluating occlusion robustness and compares modern CNN and ViT models against human performance on occluded images.

Findings

01

ViT models outperform CNNs on occluded images.

02

Deep models are less accurate than humans under diffuse occlusion.

03

Certain occlusion types significantly reduce model accuracy.

Abstract

Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under conditions of partial occlusion, i.e., conditions in which objects are partially covered from the view of a camera. Methods to improve performance under occlusion, including data augmentation, part-based clustering, and more inherently robust architectures, including Vision Transformer (ViT) models, have, to some extent, been evaluated on their ability to classify objects under partial occlusion. However, evaluations of these methods have largely relied on images containing artificial occlusion, which are typically computer-generated and therefore inexpensive to label. Additionally, methods are rarely compared against each other, and many methods are compared against early, now outdated, deep learning models. We contribute the Image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications

MethodsLinear Layer · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Vision Transformer · Softmax · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer