Identifying Semantically Duplicate Questions Using Data Science Approach: A Quora Case Study
Navedanjum Ansari, Rajesh Sharma

TL;DR
This study compares machine learning and deep learning techniques for identifying semantically duplicate questions on Quora, demonstrating improved accuracy over previous methods with the best model achieving 85.82% accuracy.
Contribution
The paper introduces a comprehensive comparison of machine learning and deep learning models for duplicate question detection on Quora, highlighting the effectiveness of character-level features and deep neural network architectures.
Findings
Xgboost with character-level TF and IDF outperforms previous models.
Deep neural networks achieve higher accuracy than traditional machine learning models.
Best model attains 85.82% accuracy, close to state-of-the-art results.
Abstract
Identifying semantically identical questions on, Question and Answering social media platforms like Quora is exceptionally significant to ensure that the quality and the quantity of content are presented to users, based on the intent of the question and thus enriching overall user experience. Detecting duplicate questions is a challenging problem because natural language is very expressive, and a unique intent can be conveyed using different words, phrases, and sentence structuring. Machine learning and deep learning methods are known to have accomplished superior results over traditional natural language processing techniques in identifying similar texts. In this paper, taking Quora for our case study, we explored and applied different machine learning and deep learning techniques on the task of identifying duplicate questions on Quora's dataset. By using feature engineering, feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Natural Language Processing Techniques
MethodsConvolution · Batch Normalization · GloVe Embeddings
