TL;DR
This paper evaluates and compares 13 deep learning APIs for multi-label image classification using semantic metrics to account for differences in label vocabularies, revealing insights into their true performance.
Contribution
It introduces semantic similarity metrics for evaluating image classification APIs, addressing vocabulary mismatch issues in performance assessment.
Findings
Microsoft, Imagga, and IBM APIs excel with traditional metrics.
Semantic metrics highlight InceptionResNet-v2, Inception-v3, ResNet50 as top semantic performers.
Evaluation on Visual Genome and Open Images datasets provides comprehensive performance insights.
Abstract
Image understanding heavily relies on accurate multi-label classification. In recent years, deep learning algorithms have become very successful for such tasks, and various commercial and open-source APIs have been released for public use. However, these APIs are often trained on different datasets, which, besides affecting their performance, might pose a challenge to their performance evaluation. This challenge concerns the different object-class dictionaries of the APIs' training dataset and the benchmark dataset, in which the predicted labels are semantically similar to the benchmark labels but considered different simply because they have different wording in the dictionaries. To face this challenge, we propose semantic similarity metrics to obtain richer understating of the APIs predicted labels and thus their performance. In this study, we evaluate and compare the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
