Environmental large language model Evaluation (ELLE) dataset: A   Benchmark for Evaluating Generative AI applications in Eco-environment Domain

Jing Guo; Nan Li; Ming Xu

arXiv:2501.06277·cs.CL·January 14, 2025

Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain

Jing Guo, Nan Li, Ming Xu

PDF

1 Repo

TL;DR

The paper introduces the ELLE dataset, a comprehensive benchmark for evaluating large language models in ecological and environmental applications, facilitating standardized assessments and promoting sustainable AI solutions.

Contribution

It presents the first dedicated evaluation dataset for AI in environmental sciences, enabling consistent performance measurement across diverse ecological topics.

Findings

01

ELLE dataset includes 1,130 QA pairs across 16 environmental topics.

02

Provides a standardized benchmark for AI performance in ecological applications.

03

Supports development of sustainable AI solutions in environmental sciences.

Abstract

Generative AI holds significant potential for ecological and environmental applications such as monitoring, data analysis, education, and policy support. However, its effectiveness is limited by the lack of a unified evaluation framework. To address this, we present the Environmental Large Language model Evaluation (ELLE) question answer (QA) dataset, the first benchmark designed to assess large language models and their applications in ecological and environmental sciences. The ELLE dataset includes 1,130 question answer pairs across 16 environmental topics, categorized by domain, difficulty, and type. This comprehensive dataset standardizes performance assessments in these fields, enabling consistent and objective comparisons of generative AI performance. By providing a dedicated evaluation tool, ELLE dataset promotes the development and application of generative AI technologies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ceeai/elle
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.