Objective Soups: Multilingual Multi-Task Modeling for Speech Processing
A F M Saif, Lisha Chen, Xiaodong Cui, Songtao Lu, Brian Kingsbury, and Tianyi Chen

TL;DR
This paper explores hierarchical multi-objective optimization strategies for multilingual multi-task speech processing, demonstrating that separating recognition and translation tasks improves model performance and scalability.
Contribution
It introduces three multi-objective formulations with a lightweight layer-selection mechanism, showing hierarchical optimization outperforms flat methods in speech processing tasks.
Findings
Hierarchical MOO outperforms flat optimization in speech tasks.
Layer-selection reduces computational overhead.
Bi-level separation improves model accuracy.
Abstract
Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making it difficult to find a common descent direction. This raises a fundamental question: should highly conflicting objectives be optimized jointly or separated into a hierarchical structure? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To ensure efficiency, we introduce a lightweight layer-selection mechanism that computes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
