The CORSMAL benchmark for the prediction of the properties of containers

Alessio Xompero; Santiago Donaher; Vladimir Iashin; Francesca Palermo,; G\"okhan Solak; Claudio Coppola; Reina Ishikawa; Yuichi Nagao; Ryo Hachiuma,; Qi Liu; Fan Feng; Chuanlin Lan; Rosa H. M. Chan; Guilherme Christmann,; Jyun-Ting Song; Gonuguntla Neeharika; Chinnakotla Krishna Teja Reddy; Dinesh; Jain; Bakhtawar Ur Rehman; Andrea Cavallaro

arXiv:2107.12719·cs.MM·April 22, 2022

The CORSMAL benchmark for the prediction of the properties of containers

Alessio Xompero, Santiago Donaher, Vladimir Iashin, Francesca Palermo,, G\"okhan Solak, Claudio Coppola, Reina Ishikawa, Yuichi Nagao, Ryo Hachiuma,, Qi Liu, Fan Feng, Chuanlin Lan, Rosa H. M. Chan, Guilherme Christmann,, Jyun-Ting Song, Gonuguntla Neeharika

PDF

TL;DR

This paper introduces the CORSMAL benchmark, an open framework with datasets and tasks for evaluating acoustic and visual methods to estimate container properties like capacity and content, aiding safe human-robot interactions.

Contribution

It provides a comprehensive benchmark framework, including datasets, tasks, and performance measures, for evaluating perception methods in estimating container properties.

Findings

01

Audio-only classifiers achieve up to 81% F1-score for content type classification.

02

Vision-only methods estimate container capacity with up to 65% accuracy.

03

Audio-visual approaches reach up to 97% accuracy in content level estimation.

Abstract

The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this estimation difficult. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct an in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audio-only and audio-visual classifiers are suitable for the estimation of the type and amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.