EMO: Earth Mover Distance Optimization for Auto-Regressive Language   Modeling

Siyu Ren; Zhiyong Wu; Kenny Q. Zhu

arXiv:2310.04691·cs.CL·February 7, 2024·1 cites

EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

Siyu Ren, Zhiyong Wu, Kenny Q. Zhu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Earth Mover Distance Optimization (EMO), a novel training method for autoregressive language models that addresses limitations of maximum likelihood estimation by better aligning model and human language distributions, leading to improved performance.

Contribution

The paper proposes EMO, a new training approach based on earth mover distance, with a feasible upper bound for efficient training, improving language modeling and downstream task performance.

Findings

01

EMO outperforms MLE in language modeling across various domains.

02

EMO enhances downstream task performance with minimal fine-tuning.

03

EMO serves as an effective lightweight calibration method for large-scale models.

Abstract

Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drsy/emo
pytorchOfficial

Videos

EMO: EARTH MOVER DISTANCE OPTIMIZATION FOR AUTO-REGRESSIVE LANGUAGE MODELING· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis