Density Adaptive Attention is All You Need: Robust Parameter-Efficient   Fine-Tuning Across Multiple Modalities

Georgios Ioannides; Aman Chadha; Aaron Elkins

arXiv:2401.11143·cs.LG·October 1, 2024·2 cites

Density Adaptive Attention is All You Need: Robust Parameter-Efficient Fine-Tuning Across Multiple Modalities

Georgios Ioannides, Aman Chadha, Aaron Elkins

PDF

Open Access 2 Repos

TL;DR

This paper introduces a probabilistic attention mechanism called DAAM that improves parameter-efficient fine-tuning across speech, text, and vision modalities, achieving significant accuracy gains and better adaptability for non-stationary data.

Contribution

It presents the novel DAAM framework and the Density Adaptive Transformer, advancing multi-modal attention with probabilistic modeling and improved explainability.

Findings

01

Up to 20% accuracy improvement over state-of-the-art methods

02

Superior performance on speech, image, and text classification tasks

03

Enhanced robustness and interpretability with the Importance Factor

Abstract

We propose the Multi-Head Density Adaptive Attention Mechanism (DAAM), a novel probabilistic attention framework that can be used for Parameter-Efficient Fine-tuning (PEFT), and the Density Adaptive Transformer (DAT), designed to enhance information aggregation across multiple modalities, including Speech, Text, and Vision. DAAM integrates learnable mean and variance into its attention mechanism, implemented in a multi-head framework, enabling it to collectively model any probability distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance, up to approximately +20% (abs.) in accuracy. Empirically, DAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Sentiment Analysis and Opinion Mining

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Dropout · Linear Layer · Byte Pair Encoding · Adam