How DDAIR you? Disambiguated Data Augmentation for Intent Recognition

Galo Castillo-L\'opez; Alexis Lombard; Nasredine Semmar; Ga\"el de Chalendar

arXiv:2601.11234·cs.CL·January 19, 2026

How DDAIR you? Disambiguated Data Augmentation for Intent Recognition

Galo Castillo-L\'opez, Alexis Lombard, Nasredine Semmar, Ga\"el de Chalendar

PDF

Open Access 1 Video

TL;DR

This paper introduces DDAIR, a method using Sentence Transformers to detect and reduce ambiguous data generated by LLMs for intent recognition, improving classification in low-resource settings.

Contribution

The paper proposes a novel approach combining sentence embeddings and iterative re-generation to mitigate ambiguity in LLM-generated data for intent detection.

Findings

01

Sentence embeddings effectively identify ambiguous examples.

02

Iterative re-generation reduces ambiguity in synthetic data.

03

Improved intent classification performance in low-resource scenarios.

Abstract

Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with regard to untargeted classes. We present DDAIR (Disambiguated Data Augmentation for Intent Recognition) to mitigate this problem. We use Sentence Transformers to detect ambiguous class-guided augmented examples generated by LLMs for intent recognition in low-resource scenarios. We identify synthetic examples that are semantically more similar to another intent than to their target one. We also provide an iterative re-generation method to mitigate such ambiguities. Our findings show that sentence embeddings effectively help to (re)generate less ambiguous examples, and suggest promising potential to improve classification performance in scenarios where intents are loosely or broadly defined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How DDAIR you? Disambiguated Data Augmentation for Intent Recognition· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling