Practice in Synonym Extraction at Large Scale
Liangliang Cao, Chang Wang

TL;DR
This paper introduces a large-scale dataset and a novel neural network approach for synonym extraction, significantly improving accuracy over traditional methods in real-world NLP applications.
Contribution
It presents a new large dataset and a feature learning neural network that outperforms existing SVM-based approaches for synonym extraction.
Findings
Neural network with feature learning outperforms SVMs.
The proposed model achieves 97% relative improvement over baseline.
Large dataset captures real-world synonym extraction challenges.
Abstract
Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale applications. Previous studies in synonym extraction are most limited to small scale datasets. In this paper, we build a large dataset with 3.4 million synonym/non-synonym pairs to capture the challenges in real world scenarios. We proposed (1) a new cost function to accommodate the unbalanced learning problem, and (2) a feature learning based deep neural network to model the complicated relationships in synonym pairs. We compare several different approaches based on SVMs and neural networks, and find out a novel feature learning based neural network outperforms the methods with hand-assigned features. Specifically, the best performance of our model surpasses the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
