Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning

Habibeh Naderi; Behrouz Haji Soleimani; Stan Matwin

arXiv:2604.16657·cs.LG·April 21, 2026

Cross-Modal Bayesian Low-Rank Adaptation for Uncertainty-Aware Multimodal Learning

Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

PDF

TL;DR

CALIBER is a Bayesian low-rank adaptation framework for multimodal learning that enhances uncertainty estimation by conditioning on cross-modal attention, improving performance in low-resource audio-text tasks.

Contribution

It introduces a novel Bayesian PEFT method that uses cross-attention to condition the adaptation process, enabling uncertainty-aware multimodal learning with efficiency.

Findings

01

CALIBER matches or outperforms existing baselines in diverse tasks.

02

Token-level cross-attention provides the most consistent performance gains.

03

The framework effectively estimates heteroscedastic uncertainty in multimodal settings.

Abstract

Large pre-trained language models are increasingly adapted to downstream tasks using parameter-efficient fine-tuning (PEFT), but existing PEFT methods are typically deterministic and unimodal, making them poorly suited for low-resource multimodal settings where predictive uncertainty and cross-modal reliability both matter. We introduce CALIBER (Context-Aware Low-rank Inference with Bayesian Embedding Regularization), a multimodal uncertainty-aware PEFT framework for audio-text learning. CALIBER extends Bayesian low-rank adaptation by conditioning the variational posterior in the adapter space on per-layer, token-level text-audio cross-attention. Specifically, text-derived low-rank features attend to frame-level audio embeddings to produce localized acoustic context, which then modulates the mean and variance of a compact stochastic latent matrix within the rank- $r$ adapter space. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.