Reliable Pseudo-labeling via Optimal Transport with Attention for Short Text Clustering
Zhihao Yao

TL;DR
This paper introduces POTA, a novel short text clustering framework that uses optimal transport and attention mechanisms to generate reliable pseudo-labels, improving discriminative representation learning and clustering accuracy.
Contribution
POTA integrates an attention mechanism with optimal transport to produce reliable pseudo-labels for short text clustering, handling data imbalance and enhancing discriminative features.
Findings
POTA outperforms state-of-the-art clustering methods.
The attention mechanism captures semantic relationships effectively.
Adaptive estimation of cluster distributions improves handling of imbalanced datasets.
Abstract
Short text clustering has gained significant attention in the data mining community. However, the limited valuable information contained in short texts often leads to low-discriminative representations, increasing the difficulty of clustering. This paper proposes a novel short text clustering framework, called Reliable \textbf{P}seudo-labeling via \textbf{O}ptimal \textbf{T}ransport with \textbf{A}ttention for Short Text Clustering (\textbf{POTA}), that generate reliable pseudo-labels to aid discriminative representation learning for clustering. Specially, \textbf{POTA} first implements an instance-level attention mechanism to capture the semantic relationships among samples, which are then incorporated as a semantic consistency regularization term into an optimal transport problem. By solving this OT problem, we can yield reliable pseudo-labels that simultaneously account for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Data Management and Algorithms · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · Contrastive Learning
