The EcoLexicon English Corpus as an open corpus in Sketch Engine
Pilar Leon-Arauz, Antonio San Martin, Arianne Reimerink

TL;DR
This paper introduces the EcoLexicon English Corpus, a large open-access environmental text corpus integrated into Sketch Engine, enabling free querying and analysis for researchers and users interested in environmental language data.
Contribution
It details the construction, compilation, and accessibility of the EEC within Sketch Engine, facilitating open access to environmental language data for the community.
Findings
The EEC contains 23.1 million words of environmental texts.
It is freely accessible and queryable via Sketch Engine.
The corpus is systematically classified for research use.
Abstract
The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of contemporary environmental texts. It was compiled by the LexiCon research group for the development of EcoLexicon (Faber, Leon-Arauz & Reimerink 2016; San Martin et al. 2017), a terminological knowledge base on the environment. It is available as an open corpus in the well-known corpus query system Sketch Engine (Kilgarriff et al. 2014), which means that any user, even without a subscription, can freely access and query the corpus. In this paper, the EEC is introduced by de- scribing how it was built and compiled and how it can be queried and exploited, based both on the functionalities provided by Sketch Engine and on the parameters in which the texts in the EEC are classified.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicslinguistics and terminology studies · Natural Language Processing Techniques · Lexicography and Language Studies
