A survey of diversity quantification in natural language processing: The why, what, where and how
Louis Est\`eve, Marie-Catherine de Marneffe, Nurit Melnik, Agata Savary, Olha Kanishcheva

TL;DR
This survey reviews how diversity is conceptualized and measured in NLP, proposing a unified framework to improve consistency, understanding, and application across the field.
Contribution
It introduces an NLP-specific framework based on a comprehensive survey of over 300 papers, clarifying diversity's conceptualization and measurement.
Findings
Diversity is often inconsistently addressed in NLP research.
The proposed framework clarifies the 'why', 'what', 'where', and 'how' of diversity.
Recommendations for standardizing diversity measurement in NLP.
Abstract
The concept of diversity has received increasing attention in natural language processing (NLP) in recent years. It became an advocated property of datasets and systems, and many measures are used to quantify it. However, it is often addressed in an ad hoc manner, with few explicit justifications of its endorsement and many cross-paper inconsistencies. There have been very few attempts to take a step back and understand the conceptualization of diversity in NLP. To address this fragmentation, we take inspiration from other scientific fields where the concept of diversity has been more thoroughly conceptualized. We build upon Stirling (2007), a unified framework adapted from ecology and economics, which distinguishes three dimensions of diversity: variety, balance, and disparity. We survey over 300 recent diversity-related papers from ACL Anthology and build an NLP-specific framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Text Readability and Simplification · Natural Language Processing Techniques
