Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data   Manipulation

Yoori Oh; Yoseob Han; Kyogu Lee

arXiv:2405.00367·cs.IR·May 2, 2024

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Yoori Oh, Yoseob Han, Kyogu Lee

PDF

TL;DR

This paper introduces a distance sampling-based paraphraser leveraging ChatGPT to generate diverse text data, addressing data imbalance in audio-language retrieval and improving task performance.

Contribution

It presents a novel method using distance functions and ChatGPT's few-shot prompting to controllably manipulate text data for enhanced retrieval accuracy.

Findings

01

Significantly improves audio-text retrieval performance

02

Outperforms traditional text augmentation methods

03

Uses distance-based control for text diversity

Abstract

There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training