ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian
Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei, Giglou, Saeed Bibak

TL;DR
ViraPart is a comprehensive framework that uses ParsBERT-based models to improve Persian text clarity by addressing ZWNJ recognition, punctuation, and Ezafe construction, achieving high accuracy.
Contribution
This work introduces the first integrated framework for Persian text refinement combining multiple linguistic tasks using deep learning.
Findings
Achieved high F1 scores: 96.90% for ZWNJ, 92.13% for punctuation, 98.50% for Ezafe.
Demonstrated effectiveness of combined models for Persian text refinement.
Improved understanding and precision in Persian NLP tasks.
Abstract
The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian following by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
