Spanish Pre-trained BERT Model and Evaluation Data

Jos\'e Ca\~nete; Gabriel Chaperon; Rodrigo Fuentes; Jou-Hui Ho; Hojin; Kang; Jorge P\'erez

arXiv:2308.02976·cs.CL·August 8, 2023·336 cites

Spanish Pre-trained BERT Model and Evaluation Data

Jos\'e Ca\~nete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin, Kang, Jorge P\'erez

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces a Spanish-specific BERT model and a comprehensive set of evaluation tasks, improving performance on Spanish NLP benchmarks and providing resources for future research.

Contribution

The paper presents a new Spanish BERT model and a unified Spanish benchmark suite, facilitating better NLP performance and resource sharing for Spanish language processing.

Findings

01

The Spanish BERT model outperforms multilingual models on most tasks.

02

Achieved state-of-the-art results on several Spanish NLP benchmarks.

03

Public release of the model, data, and benchmark suite.

Abstract

The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
ignacio-ave/beto-sentiment-analysis-spanish
model· 81k dl· ♡ 6
81k dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis