AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok, Suho Yoo, Jaeho Lee

TL;DR
AudioBERT enhances language models with auditory knowledge using a retrieval-based method, addressing their lack of auditory understanding as revealed by AuditoryBench, and demonstrates improved performance on auditory tasks.
Contribution
This paper introduces AudioBERT, a novel retrieval-based approach to augment BERT with auditory knowledge, and presents AuditoryBench for evaluating auditory understanding in language models.
Findings
Language models lack significant auditory knowledge.
AudioBERT improves auditory understanding on AuditoryBench.
Retrieval-based augmentation effectively injects audio knowledge.
Abstract
Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, \textit{e.g.,} colors of everyday objects. Motivated by this observation, we ask whether a similar shortcoming exists in terms of the \textit{auditory} knowledge. To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge. Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge. To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach. First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently. Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Layer Normalization · Dropout · WordPiece · Residual Connection · Attention Dropout · Linear Layer · Multi-Head Attention
