Machine Learning Framework for Audio-Based Content Evaluation using   MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering

Aris J. Aristorenas

arXiv:2411.00195·cs.SD·November 4, 2024

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering

Aris J. Aristorenas

PDF

Open Access

TL;DR

This paper introduces a machine learning framework that uses advanced audio features to evaluate content similarity and predict sentiment scores, demonstrating promising results in media analysis applications.

Contribution

It presents a novel combination of feature extraction and regression modeling for sentiment prediction in audio content, with a new dataset of YouTube music covers and original songs.

Findings

01

Achieved low RMSE in sentiment score prediction across features

02

Demonstrated the effectiveness of MFCC, Chroma, Spectral Contrast, and Temporal features

03

Improved performance over baseline models

Abstract

This study presents a machine learning framework for assessing similarity between audio content and predicting sentiment score. We construct a dataset containing audio samples from music covers on YouTube along with the audio of the original song, and sentiment scores derived from user comments, serving as proxy labels for content quality. Our approach involves extensive pre-processing, segmenting audio signals into 30-second windows, and extracting high-dimensional feature representations through Mel-Frequency Cepstral Coefficients (MFCC), Chroma, Spectral Contrast, and Temporal characteristics. Leveraging these features, we train regression models to predict sentiment scores on a 0-100 scale, achieving root mean square error (RMSE) values of 3.420, 5.482, 2.783, and 4.212, respectively. Improvements over a baseline model based on absolute difference metrics are observed. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing