Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning
Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

TL;DR
CALIBER is a Bayesian low-rank adaptation framework for multimodal learning that enhances uncertainty estimation by conditioning on cross-modal attention, improving performance in low-resource audio-text tasks.
Contribution
It introduces a novel Bayesian PEFT method that uses cross-attention to condition the adaptation process, enabling uncertainty-aware multimodal learning with efficiency.
Findings
CALIBER matches or outperforms existing baselines in diverse tasks.
Token-level cross-attention provides the most consistent performance gains.
The framework effectively estimates heteroscedastic uncertainty in multimodal settings.
Abstract
Large pre-trained language models are increasingly adapted to downstream tasks using parameter-efficient fine-tuning (PEFT), but existing PEFT methods are typically deterministic and unimodal, making them poorly suited for low-resource multimodal settings where predictive uncertainty and cross-modal reliability both matter. We introduce CALIBER (Context-Aware Low-rank Inference with Bayesian Embedding Regularization), a multimodal uncertainty-aware PEFT framework for audio-text learning. CALIBER extends Bayesian low-rank adaptation by conditioning the variational posterior in the adapter space on per-layer, token-level text-audio cross-attention. Specifically, text-derived low-rank features attend to frame-level audio embeddings to produce localized acoustic context, which then modulates the mean and variance of a compact stochastic latent matrix within the rank- adapter space. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
