Evaluating Large Language Models for Generalization and Robustness via   Data Compression

Yucheng Li; Yunhao Guo; Frank Guerin; Chenghua Lin

arXiv:2402.00861·cs.CL·February 6, 2024·2 cites

Evaluating Large Language Models for Generalization and Robustness via Data Compression

Yucheng Li, Yunhao Guo, Frank Guerin, Chenghua Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a lossless data compression-based evaluation method for large language models to assess their generalization and robustness across diverse data sources and time periods, addressing limitations of existing benchmarks.

Contribution

It proposes a novel compression-based evaluation framework that measures models' ability to generalize and remain robust over time, using comprehensive datasets and analysis of various model performances.

Findings

01

Models' compression rates decline after training cutoff, indicating limited generalization.

02

Mistral and Llama-2 show a good balance of performance and robustness.

03

Models perform better on arXiv papers than on news and code data.

Abstract

Existing methods for evaluating large language models face challenges such as data contamination, sensitivity to prompts, and the high cost of benchmark creation. To address this, we propose a lossless data compression based evaluation approach that tests how models' predictive abilities generalize after their training cutoff. Specifically, we collect comprehensive test data spanning 83 months from 2017 to 2023 and split the data into training and testing periods according to models' training data cutoff. We measure: 1) the compression performance on the testing period as a measure of generalization on unseen data; and 2) the performance gap between the training and testing period as a measure of robustness. Our experiments test 14 representative large language models with various sizes on sources including Wikipedia, news articles, code, arXiv papers, and multi-modal data. We find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyucheng09/llm-compressive
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques