evaluating bert and parsbert for analyzing persian advertisement data
Ali Mehrban, Pegah Ahadian

TL;DR
This paper compares the performance of mBERT and ParsBERT models in analyzing Persian advertisement data from the Divar marketplace, highlighting challenges and opportunities in applying language models to low-resource languages.
Contribution
The study provides a comparative analysis of mBERT and ParsBERT on Persian text data, including fine-tuning details and performance insights for low-resource language processing.
Findings
ParsBERT outperforms mBERT in Persian text analysis
Data cleaning and normalization improve model accuracy
Insights into challenges of low-resource language NLP
Abstract
This paper discusses the impact of the Internet on modern trading and the importance of data generated from these transactions for organizations to improve their marketing efforts. The paper uses the example of Divar, an online marketplace for buying and selling products and services in Iran, and presents a competition to predict the percentage of a car sales ad that would be published on the Divar website. Since the dataset provides a rich source of Persian text data, the authors use the Hazm library, a Python library designed for processing Persian text, and two state-of-the-art language models, mBERT and ParsBERT, to analyze it. The paper's primary objective is to compare the performance of mBERT and ParsBERT on the Divar dataset. The authors provide some background on data mining, Persian language, and the two language models, examine the dataset's composition and statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Natural Language Processing Techniques
MethodsLib · Multi-Head Attention · Attention Is All You Need · Softmax · Adam · Layer Normalization · Linear Layer · Dropout · WordPiece · Weight Decay
