Persian Causality Corpus (PerCause) and the Causality Detection Benchmark
Zeinab Rahimi, Mehrnoush ShamsFard

TL;DR
This paper introduces a new annotated corpus for Persian causality detection, develops systems for identifying causal elements, and establishes a benchmark comparing machine learning and deep learning methods with promising results.
Contribution
It provides the first human-annotated Persian causality corpus and a benchmark for causality detection using multiple machine learning approaches.
Findings
CRF classifier achieved F-measure of 0.76
Bi-LSTM-CRF achieved 91.4% accuracy
Deep learning methods outperform traditional classifiers
Abstract
Recognizing causal elements and causal relations in text is one of the challenging issues in natural language processing; specifically, in low resource languages such as Persian. In this research we prepare a causality human annotated corpus for the Persian language which consists of 4446 sentences and 5128 causal relations and three labels of cause, effect and causal mark -- if possibl -- are specified for each relation. We have used this corpus to train a system for detecting causal elements boundaries. Also, we present a causality detection benchmark for three machine learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through CRF classifier which has F-measure of 0.76 and the best accuracy obtained through Bi-LSTM-CRF deep learning method with Accuracy equal to %91.4.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConditional Random Field
