Personalized Audio Quality Preference Prediction
Chung-Che Wang, Yu-Chun Lin, Yu-Teng Hsu, Jyh-Shing Roger Jang

TL;DR
This study develops a siamese network-based method to predict individual audio quality preferences by incorporating both audio features and subject information, achieving improved accuracy over baseline models.
Contribution
It introduces a novel personalized preference prediction model combining audio input and comprehensive subject data, outperforming previous approaches.
Findings
Overall accuracy increased from 77.56% to 78.04%.
Using full subject information improves prediction performance.
LDNet with PANNs' CNN6 encoder outperforms baseline models.
Abstract
This paper proposes to use both audio input and subject information to predict the personalized preference of two audio segments with the same content in different qualities. A siamese network is used to compare the inputs and predict the preference. Several different structures for each side of the siamese network are investigated, and an LDNet with PANNs' CNN6 as the encoder and a multi-layer perceptron block as the decoder outperforms a baseline model using only audio input the most, where the overall accuracy grows from 77.56% to 78.04%. Experimental results also show that using all the subject information, including age, gender, and the specifications of headphones or earphones, is more effective than using only a part of them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsSiamese Network
