Deep F-measure Maximization for End-to-End Speech Understanding
Leda Sar{\i}, Mark Hasegawa-Johnson

TL;DR
This paper introduces a differentiable F-measure maximization method for training neural networks, improving fairness and class coverage in speech understanding and other tasks by addressing label imbalance.
Contribution
It proposes a novel differentiable approximation to the F-measure and demonstrates its effectiveness across multiple datasets and tasks.
Findings
Up to 8% absolute improvement in micro-F1 scores.
Significant increase in class coverage and positive recall.
Effective across speech and non-speech datasets.
Abstract
Spoken language understanding (SLU) datasets, like many other machine learning datasets, usually suffer from the label imbalance problem. Label imbalance usually causes the learned model to replicate similar biases at the output which raises the issue of unfairness to the minority classes in the dataset. In this work, we approach the fairness problem by maximizing the F-measure instead of accuracy in neural network model training. We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation. We perform experiments on two standard fairness datasets, Adult, and Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset. In all four of these tasks, F-measure maximization results in improved micro-F1 scores, with absolute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
