ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

Yuguo Yin; Yuxin Xie; Wenyuan Yang; Dongchao Yang; Jinghan Ru; Xianwei Zhuang; Liming Liang; Yuexian Zou

arXiv:2502.14627·cs.SD·June 5, 2025

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors

Yuguo Yin, Yuxin Xie, Wenyuan Yang, Dongchao Yang, Jinghan Ru, Xianwei Zhuang, Liming Liang, Yuexian Zou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ATRI, a novel approach for multilingual audio-text retrieval that reduces data distribution errors to improve consistency and recall across multiple languages, achieving state-of-the-art results.

Contribution

The paper presents a theoretical analysis of inconsistencies in ML-ATR and proposes a new scheme using contrastive learning to mitigate data distribution errors.

Findings

01

Achieves state-of-the-art recall on multilingual datasets.

02

Improves consistency across eight languages.

03

Reduces data distribution errors in ML-ATR.

Abstract

Multilingual audio-text retrieval (ML-ATR) is a challenging task that aims to retrieve audio clips or multilingual texts from databases. However, existing ML-ATR schemes suffer from inconsistencies for instance similarity matching across languages. We theoretically analyze the inconsistency in terms of both multilingual modal alignment direction error and weight error, and propose the theoretical weight error upper bound for quantifying the inconsistency. Based on the analysis of the weight error upper bound, we find that the inconsistency problem stems from the data distribution error caused by random sampling of languages. We propose a consistent ML-ATR scheme using 1-to-k contrastive learning and audio-English co-anchor contrastive learning, aiming to mitigate the negative impact of data distribution error on recall and consistency in ML-ATR. Experimental results on the translated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atri-acl/atri-acl
noneOfficial

Videos

ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors· underline

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Diverse Musicological Studies

MethodsContrastive Learning