The EcoLexicon English Corpus as an open corpus in Sketch Engine

Pilar Leon-Arauz; Antonio San Martin; Arianne Reimerink

arXiv:1807.05797·cs.CL·July 17, 2018·51 cites

The EcoLexicon English Corpus as an open corpus in Sketch Engine

Pilar Leon-Arauz, Antonio San Martin, Arianne Reimerink

PDF

Open Access

TL;DR

This paper introduces the EcoLexicon English Corpus, a large open-access environmental text corpus integrated into Sketch Engine, enabling free querying and analysis for researchers and users interested in environmental language data.

Contribution

It details the construction, compilation, and accessibility of the EEC within Sketch Engine, facilitating open access to environmental language data for the community.

Findings

01

The EEC contains 23.1 million words of environmental texts.

02

It is freely accessible and queryable via Sketch Engine.

03

The corpus is systematically classified for research use.

Abstract

The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of contemporary environmental texts. It was compiled by the LexiCon research group for the development of EcoLexicon (Faber, Leon-Arauz & Reimerink 2016; San Martin et al. 2017), a terminological knowledge base on the environment. It is available as an open corpus in the well-known corpus query system Sketch Engine (Kilgarriff et al. 2014), which means that any user, even without a subscription, can freely access and query the corpus. In this paper, the EEC is introduced by de- scribing how it was built and compiled and how it can be queried and exploited, based both on the functionalities provided by Sketch Engine and on the parameters in which the texts in the EEC are classified.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicslinguistics and terminology studies · Natural Language Processing Techniques · Lexicography and Language Studies