Gradient-Informed Training for Low-Resource Multilingual Speech Translation
Ruiyan Sun, Satoshi Nakamura

TL;DR
This paper introduces a gradient-informed method to optimize layer sharing in low-resource multilingual speech translation, improving convergence and translation quality.
Contribution
It proposes a novel approach using gradient analysis to automatically determine optimal layer sharing patterns across languages.
Findings
Improved translation quality across four language pairs.
Gradient-based analysis effectively guides sharing pattern selection.
Method enhances convergence in low-resource settings.
Abstract
In low-resource multilingual speech-to-text translation, uniform architectural sharing across languages frequently introduces representation conflicts that impede convergence. This work proposes a principled methodology to automatically determine layer-specific sharing patterns by mining training gradient information. Our approach employs three distinct analysis strategies: distance-based language clustering, self/cross-task divergence metrics for capacity allocation, and joint factorization coupled with canonical correlation analysis for subspace alignment. Extensive evaluation across four language pairs (using the SeamlessM4T-Medium architecture) demonstrates persistent improvements in translation quality metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
