# Extracting Bilingual Persian Italian Lexicon from Comparable Corpora   Using Different Types of Seed Dictionaries

**Authors:** Ebrahim Ansari, M.H. Sadreddini, Lucio Grandinetti, Mahsa Radinmehr,, Ziba Khosravan, and Mehdi Sheikhalishahi

arXiv: 1701.08340 · 2019-09-23

## TL;DR

This paper explores methods for extracting a bilingual Persian-Italian lexicon from comparable corpora using various seed dictionaries, proposing novel combination and weighting models to improve accuracy and efficiency.

## Contribution

It introduces new models for combining multiple seed dictionaries and a weighting system to enhance bilingual lexicon extraction from comparable corpora.

## Key findings

- Combination models improve lexicon accuracy
- Novel weighting system enhances extraction quality
- Experimental results validate model efficiency

## Abstract

Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Almost all use a small existing dictionary or other resources to make an initial list called the "seed dictionary". In this paper, we discuss the use of different types of dictionaries as the initial starting list for creating a bilingual Persian-Italian lexicon from a comparable corpus. Our experiments apply state-of-the-art techniques on three different seed dictionaries; an existing dictionary, a dictionary created with pivot-based schema, and a dictionary extracted from a small Persian-Italian parallel text. The interesting challenge of our approach is to find a way to combine different dictionaries together in order to produce a better and more accurate lexicon. In order to combine seed dictionaries, we propose two different combination models and examine the effect of our novel combination models on various comparable corpora that have differing degrees of comparability. We conclude with a proposal for a new weighting system to improve the extracted lexicon. The experimental results produced by our implementation show the efficiency of our proposed models.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.08340/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1701.08340/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1701.08340/full.md

---
Source: https://tomesphere.com/paper/1701.08340