Density Adaptive Attention is All You Need: Robust Parameter-Efficient Fine-Tuning Across Multiple Modalities
Georgios Ioannides, Aman Chadha, Aaron Elkins

TL;DR
This paper introduces a probabilistic attention mechanism called DAAM that improves parameter-efficient fine-tuning across speech, text, and vision modalities, achieving significant accuracy gains and better adaptability for non-stationary data.
Contribution
It presents the novel DAAM framework and the Density Adaptive Transformer, advancing multi-modal attention with probabilistic modeling and improved explainability.
Findings
Up to 20% accuracy improvement over state-of-the-art methods
Superior performance on speech, image, and text classification tasks
Enhanced robustness and interpretability with the Importance Factor
Abstract
We propose the Multi-Head Density Adaptive Attention Mechanism (DAAM), a novel probabilistic attention framework that can be used for Parameter-Efficient Fine-tuning (PEFT), and the Density Adaptive Transformer (DAT), designed to enhance information aggregation across multiple modalities, including Speech, Text, and Vision. DAAM integrates learnable mean and variance into its attention mechanism, implemented in a multi-head framework, enabling it to collectively model any probability distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance, up to approximately +20% (abs.) in accuracy. Empirically, DAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Sentiment Analysis and Opinion Mining
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam
