A Comprehensive Analysis of Static Word Embeddings for Turkish

Karahan Sar{\i}ta\c{s}; Cahid Arda \"Oz; Tunga G\"ung\"or

arXiv:2405.07778·cs.CL·May 14, 2024

A Comprehensive Analysis of Static Word Embeddings for Turkish

Karahan Sar{\i}ta\c{s}, Cahid Arda \"Oz, Tunga G\"ung\"or

PDF

1 Repo 1 Models

TL;DR

This study compares static and contextual word embeddings for Turkish in intrinsic and extrinsic NLP tasks, providing insights into their suitability and creating a public Turkish embedding repository.

Contribution

It is the first comprehensive comparison of static and contextual embeddings specifically for Turkish, including detailed syntactic and semantic analysis.

Findings

01

Static and contextual models show different strengths in syntactic and semantic tasks.

02

The study provides a Turkish word embedding repository for future research.

03

Insights into the suitability of embedding models for various NLP tasks in Turkish.

Abstract

Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

turkish-word-embeddings/word-embeddings-repository-for-turkish
pytorchOfficial

Models

🤗
CahidArda/bert-turkish-x2static
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.