Balancing Multi-modal Sensor Learning via Multi-objective Optimization

Heshan Fernando; Quan Xiao; Parikshit Ram; Yi Zhou; Horst Samulowitz; Nathalie Baracaldo; Tianyi Chen

arXiv:2511.06686·cs.LG·April 1, 2026

Balancing Multi-modal Sensor Learning via Multi-objective Optimization

Heshan Fernando, Quan Xiao, Parikshit Ram, Yi Zhou, Horst Samulowitz, Nathalie Baracaldo, Tianyi Chen

PDF

TL;DR

This paper introduces MIMO, a gradient-based multi-objective optimization method that balances multi-modal sensor learning, improves performance, and reduces computational costs in control systems.

Contribution

It reformulates multi-modal sensor learning as a multi-objective problem and proposes MIMO, a simple, efficient method with convergence guarantees and superior performance.

Findings

01

MIMO outperforms state-of-the-art balancing methods.

02

Achieves up to ~20x reduction in computation time.

03

Improves robustness under sensing perturbations.

Abstract

Learning-enabled control systems increasingly rely on multiple sensing modalities (e.g., vision, audio, language, etc.) for perception and decision support. A key challenge is that multi-modal sensor training dynamics are often imbalanced: fast-to-learn sensing channels dominate optimization, while slower channels remain underutilized, degrading reliability under sensing perturbations. Existing balancing strategies are largely heuristic and can require computationally intensive subroutines. In this paper, we reformulate multi-modal sensor learning as a multi-objective optimization (MOO) problem that explicitly prioritizes the worst-performing modality while retaining the nominal multi-modal sensor fusion objective. We then propose a simple gradient-based method, MIMO (multi-modal sensor learning via MOO), for the resulting formulation. We provide convergence guarantees and evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.