Word Matters: What Influences Domain Adaptation in Summarization?

Yinghao Li; Siyu Miao; Heyan Huang; Yang Gao

arXiv:2406.14828·cs.CL·June 24, 2024

Word Matters: What Influences Domain Adaptation in Summarization?

Yinghao Li, Siyu Miao, Heyan Huang, Yang Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores how specific word-level factors influence domain adaptation in summarization, proposing a new measure of dataset difficulty and demonstrating its predictive power for model performance across domains.

Contribution

It introduces a word-based dataset difficulty metric and reveals its linear relationship with performance gains, enabling performance prediction without retraining.

Findings

01

Cross-domain overlap correlates linearly with performance gain.

02

Word-based compression rate and abstraction level determine dataset difficulty.

03

Dataset difficulty predicts model performance on unseen domains.

Abstract

Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation performance, analyzing the specific impact of `words' in training data on summarization tasks. We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization, which is determined by two indicators: word-based compression rate and abstraction level. Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship, which is not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

li-aolong/Word-Matters
noneOfficial

Videos

Word Matters: What Influences Domain Adaptation in Summarization?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems