Bilingual Terminology Extraction from Comparable E-Commerce Corpora
Hao Jia, Shuqin Gu, Yuqi Zhang, Xiangyu Duan

TL;DR
This paper introduces a novel framework for extracting bilingual e-commerce terminologies from comparable data, leveraging cross-lingual pre-training to improve accuracy over existing methods.
Contribution
The paper presents a new approach that utilizes cross-lingual pre-training and deep semantic relationships to extract bilingual terminologies from comparable corpora in e-commerce.
Findings
Achieves significantly better performance than strong baselines.
Effective in extracting accurate bilingual terminologies from scarce parallel data.
Applicable to various language pairs in e-commerce domain.
Abstract
Bilingual terminologies are important machine translation resources in the field of e-commerce, which are usually either manually translated or automatically extracted from parallel data. The human translation is costly and e-commerce parallel corpora is very scarce. However, the comparable data in different languages in the same commodity field is abundant. In this paper, we propose a novel framework of extracting e-commercial bilingual terminologies from comparable data. Benefiting from the cross-lingual pre-training in e-commerce, our framework can make full use of the deep semantic relationship between source-side terminology and target-side sentence to extract corresponding target terminology. Experimental results on various language pairs show that our approaches achieve significantly better performance than various strong baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Lexicography and Language Studies
