Audio classification of the content of food containers and drinking glasses
Santiago Donaher, Alessio Xompero, Andrea Cavallaro

TL;DR
This paper introduces a sound-based model for classifying the content type and amount in food containers and glasses, using a two-step process of action recognition and content classification, validated on the CORSMAL dataset.
Contribution
The paper presents a novel two-step model that improves accuracy in classifying container content by decomposing the problem into action recognition and content classification.
Findings
Achieved 76.02, 78.24, and 41.89 weighted F1 scores on test sets.
Outperforms existing methods in classifying content type and amount.
Validates the effectiveness of the two-step approach on the CORSMAL dataset.
Abstract
Food containers, drinking glasses and cups handled by a person generate sounds that vary with the type and amount of their content. In this paper, we propose a new model for sound-based classification of the type and amount of content in a container. The proposed model is based on the decomposition of the problem into two steps, namely action recognition and content classification. We use the scenario of the recent CORSMAL Containers Manipulation dataset and consider two actions (shaking and pouring), and seven combinations of material and filling level. The first step identifies the action performed by a person with the container. The second step determines the amount and type of content using an action-specific classifier. Experiments show that the proposed model achieves 76.02, 78.24, and 41.89 weighted average F1 score on the three test sets, respectively, and outperforms baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Handwritten Text Recognition Techniques · Speech and Audio Processing
