A Deep Learning-Based Approach for Measuring the Domain Similarity of Persian Texts
Hossein Keshavarz, Shohreh Tabatabayi Seifi, Mohammad Izadi

TL;DR
This paper introduces a deep learning method utilizing word embeddings and neural networks to measure the domain similarity of Persian texts, achieving high accuracy on a custom advertisement dataset.
Contribution
It presents a novel deep learning framework specifically designed for Persian text domain similarity measurement, including dataset creation and model training.
Findings
Achieved an F1 score of 0.9865 on the similarity scoring task.
Developed a dataset with annotated similarity scores for Persian advertisements.
Demonstrated the effectiveness of deep neural networks in Persian text similarity assessment.
Abstract
In this paper, we propose a novel approach for measuring the degree of similarity between categories of two pieces of Persian text, which were published as descriptions of two separate advertisements. We built an appropriate dataset for this work using a dataset which consists of advertisements posted on an e-commerce website. We generated a significant number of paired texts from this dataset and assigned each pair a score from 0 to 3, which demonstrates the degree of similarity between the domains of the pair. In this work, we represent words with word embedding vectors derived from word2vec. Then deep neural network models are used to represent texts. Eventually, we employ concatenation of absolute difference and bit-wise multiplication and a fully-connected neural network to produce a probability distribution vector for the score of the pairs. Through a supervised learning approach,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
