XferBench: a Data-Driven Benchmark for Emergent Language

Brendon Boldt; David Mortensen

arXiv:2407.03456·cs.CL·July 8, 2024

XferBench: a Data-Driven Benchmark for Emergent Language

Brendon Boldt, David Mortensen

PDF

Open Access 1 Video

TL;DR

This paper presents XferBench, a data-driven benchmark that evaluates emergent languages by their similarity to human language, using downstream NLP task performance as a measure of quality.

Contribution

The paper introduces a novel benchmark and Python package for assessing emergent language quality based on downstream NLP task performance.

Findings

01

The benchmark correlates well with human judgments of language quality.

02

Emergent languages show varying degrees of similarity to human language.

03

The package provides an easy-to-use tool for researchers to evaluate emergent languages.

Abstract

In this paper, we introduce a benchmark for evaluating the overall quality of emergent languages using data-driven methods. Specifically, we interpret the notion of the "quality" of an emergent language as its similarity to human language within a deep learning framework. We measure this by using the emergent language as pretraining data for a downstream NLP tasks in human language -- the better the downstream performance, the better the emergent language. We implement this benchmark as an easy-to-use Python package that only requires a text file of utterances from the emergent language to be evaluated. Finally, we empirically test the benchmark's validity using human, synthetic, and emergent language baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

XferBench: a Data-Driven Benchmark for Emergent Language· underline

Taxonomy

TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Computational Physics and Python Applications