Coverage-based Fairness in Multi-document Summarization

Haoyuan Li; Yusen Zhang; Rui Zhang; Snigdha Chaturvedi

arXiv:2412.08795·cs.CL·March 26, 2025

Coverage-based Fairness in Multi-document Summarization

Haoyuan Li, Yusen Zhang, Rui Zhang, Snigdha Chaturvedi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces new fairness measures for multi-document summarization that account for redundancy and corpus-level disparities, and evaluates various language models using these measures.

Contribution

It proposes the Equal Coverage and Coverage Parity measures, addressing limitations of previous fairness metrics in MDS, and applies them to evaluate multiple LLMs.

Findings

01

Claude3-sonnet is the fairest LLM evaluated.

02

Most LLMs tend to overrepresent certain social attributes.

03

The new measures align better with fairness definitions.

Abstract

Fairness in multi-document summarization (MDS) measures whether a system can generate a summary fairly representing information from documents with different social attribute values. Fairness in MDS is crucial since a fair summary can offer readers a comprehensive view. Previous works focus on quantifying summary-level fairness using Proportional Representation, a fairness measure based on Statistical Parity. However, Proportional Representation does not consider redundancy in input documents and overlooks corpus-level unfairness. In this work, we propose a new summary-level fairness measure, Equal Coverage, which is based on coverage of documents with different social attribute values and considers the redundancy within documents. To detect the corpus-level unfairness, we propose a new corpus-level measure, Coverage Parity. Our human evaluations show that our measures align more with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leehaoyuan/coverage_fairness
pytorchOfficial

Videos

Coverage-based Fairness in Multi-document Summarization· underline

Taxonomy

TopicsData Quality and Management · Topic Modeling · Semantic Web and Ontologies

MethodsALIGN · Focus