Towards Real-world Scenario: Imbalanced New Intent Discovery
Shun Zhang, Chaoran Yan, Jian Yang, Jiaheng Liu, Ying Mo, Jiaqi Bai,, Tongliang Li, Zhoujun Li

TL;DR
This paper introduces the imbalanced new intent discovery (i-NID) task and a new benchmark to better reflect real-world long-tailed intent data, proposing a robust model that outperforms existing methods in identifying both known and novel user intents.
Contribution
It defines the i-NID task, creates the ImbaNID-Bench benchmark, and proposes the ImbaNID model to effectively discover intents in imbalanced, long-tailed distributions.
Findings
ImbaNID outperforms previous methods on new benchmark.
The benchmark covers diverse real-world scenarios.
The model effectively handles long-tailed intent data.
Abstract
New Intent Discovery (NID) aims at detecting known and previously undefined categories of user intent by utilizing limited labeled and massive unlabeled data. Most prior works often operate under the unrealistic assumption that the distribution of both familiar and new intent classes is uniform, overlooking the skewed and long-tailed distributions frequently encountered in real-world scenarios. To bridge the gap, our work introduces the imbalanced new intent discovery (i-NID) task, which seeks to identify familiar and novel intent categories within long-tailed distributions. A new benchmark (ImbaNID-Bench) comprised of three datasets is created to simulate the real-world long-tail distributions. ImbaNID-Bench ranges from broad cross-domain to specific single-domain intent categories, providing a thorough representation of practical use cases. Besides, a robust baseline model ImbaNID is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Database Systems and Queries
